The arXiv has collected 1.5 million pre-print articles over 28 years, hosting literature from scientific fields including Physics, Mathematics, and Computer Science. Each pre-print features text, figures, authors, citations, categories, and other metadata. These rich, multi-modal features, combined with the natural graph structure—created by citation, affiliation, and co-authorship—makes the arXiv an exciting candidate for benchmarking next-generation models. Here we take the first necessary steps toward this goal, by providing a pipeline which standardizes and simplifies access to the arXiv’s publicly available data. We use this pipeline to extract and analyze a 6.7 million edge citation graph, with an 11 billion word cor- pus of full-text research articles. We present some baseline classification results, and motivate application of more exciting generative graph models.
more »
« less
Computer-aided key step generation in alkaloid total synthesis
Efficient chemical synthesis is critical to satisfying future demands for medicines, materials, and agrochemicals. Retrosynthetic analysis of modestly complex molecules has been automated over the course of decades, but the combinatorial explosion of route possibilities has challenged computer hardware and software until only recently. Here, we explore a computational strategy that merges computer-aided synthesis planning with molecular graph editing to minimize the number of synthetic steps required to produce alkaloids. Our study culminated in an enantioselective three-step synthesis of (–)-stemoamide by leveraging high-impact key steps, which could be identified in computer-generated retrosynthesis plans using graph edit distances.
more »
« less
- Award ID(s):
- 2236215
- PAR ID:
- 10489489
- Publisher / Repository:
- American Association for the Advancement of Science
- Date Published:
- Journal Name:
- Science
- Volume:
- 379
- Issue:
- 6631
- ISSN:
- 0036-8075
- Page Range / eLocation ID:
- 453 to 457
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Syntax-guided synthesis has been a prevalent theme in various computer-aided programming systems. However, the domain of bit-vector synthesis poses several unique challenges that have not yet been sufficiently addressed and resolved. In this paper, we propose a novel synthesis approach that incorporates a distinct enumeration strategy based on various factors. Technically, this approach weighs in subexpression recurrence by term-graph-based enumeration, avoids useless candidates by example-guided filtration, prioritizes valuable components identified by large language models. This approach also incorporates a bottom-up deduction step to enhance the enumeration algorithm by considering subproblems that contribute to the deductive resolution. We implement all the enhanced enumeration techniques in our SyGuS solver DryadSynth, which outperforms state-of-the-art solvers in terms of the number of solved problems, execution time, and solution size. Notably, DryadSynth successfully solved 31 synthesis problems for the first time, including 5 renowned Hacker's Delight problems.more » « less
-
Reported herein is the first total synthesis of the acorane-type terpenoid rhodocorane L, isolated from the fungus Rhodotus palmatus. This unique tricyclic scaffold containing a spiro[4.5] bicycle has antifungal activity. Key steps include diastereoselective cuprate addition, gold-catalyzed dearomative spirocyclization, and intramolecular aldol. This is the first time that a catalyst-controlled asymmetric dearomative spirocyclization has been utilized in a total synthesis. Rhodocorane L was synthesized in 17 steps, as the longest linear sequence.more » « less
-
Abstract Herein, we describe our synthetic efforts toward the pupukeanane natural products, in which we have completed the first enantiospecific route to 2‐isocyanoallopupukeanane in 10 steps (formal synthesis), enabled by a key Pd‐mediated cyclization cascade. This subsequently facilitated an unprecedented bio‐inspired “contra‐biosynthetic” rearrangement, providing divergent access to 9‐isocyanopupukeanane in 15 steps (formal synthesis). Computational studies provide insight into the nature of this rearrangement.more » « less
-
Cabello, Sergio; Chen, Danny Z. (Ed.)In this paper, we consider the Visibility Graph Recognition and Reconstruction problems in the context of terrains. Here, we are given a graph G with labeled vertices v₀, v₁, …, v_{n-1} such that the labeling corresponds with a Hamiltonian path H. G also may contain other edges. We are interested in determining if there is a terrain T with vertices p₀, p₁, …, p_{n-1} such that G is the visibility graph of T and the boundary of T corresponds with H. G is said to be persistent if and only if it satisfies the so-called X-property and Bar-property. It is known that every "pseudo-terrain" has a persistent visibility graph and that every persistent graph is the visibility graph for some pseudo-terrain. The connection is not as clear for (geometric) terrains. It is known that the visibility graph of any terrain T is persistent, but it has been unclear whether every persistent graph G has a terrain T such that G is the visibility graph of T. There actually have been several papers that claim this to be the case (although no formal proof has ever been published), and recent works made steps towards building a terrain reconstruction algorithm for any persistent graph. In this paper, we show that there exists a persistent graph G that is not the visibility graph for any terrain T. This means persistence is not enough by itself to characterize the visibility graphs of terrains, and implies that pseudo-terrains are not stretchable.more » « less
An official website of the United States government

