The development of a materials synthesis route is usually based on heuristics and experience. A possible new approach would be to apply data-driven approaches to learn the patterns of synthesis from past experience and use them to predict the syntheses of novel materials. However, this route is impeded by the lack of a large-scale database of synthesis formulations. In this work, we applied advanced machine learning and natural language processing techniques to construct a dataset of 35,675 solution-based synthesis procedures extracted from the scientific literature. Each procedure contains essential synthesis information including the precursors and target materials, their quantities, and the synthesis actions and corresponding attributes. Every procedure is also augmented with the reaction formula. Through this work, we are making freely available the first large dataset of solution-based inorganic materials synthesis procedures.
more » « less- Award ID(s):
- 1922372
- PAR ID:
- 10367521
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Scientific Data
- Volume:
- 9
- Issue:
- 1
- ISSN:
- 2052-4463
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Applying AI power to predict syntheses of novel materials requires high-quality, large-scale datasets. Extraction of synthesis information from scientific publications is still challenging, especially for extracting synthesis actions, because of the lack of a comprehensive labeled dataset using a solid, robust, and well-established ontology for describing synthesis procedures. In this work, we propose the first unified language of synthesis actions (ULSA) for describing inorganic synthesis procedures. We created a dataset of 3040 synthesis procedures annotated by domain experts according to the proposed ULSA scheme. To demonstrate the capabilities of ULSA, we built a neural network-based model to map arbitrary inorganic synthesis paragraphs into ULSA and used it to construct synthesis flowcharts for synthesis procedures. Analysis of the flowcharts showed that (a) ULSA covers essential vocabulary used by researchers when describing synthesis procedures and (b) it can capture important features of synthesis protocols. The present work focuses on the synthesis protocols for solid-state, sol–gel, and solution-based inorganic synthesis, but the language could be extended in the future to include other synthesis methods. This work is an important step towards creating a synthesis ontology and a solid foundation for autonomous robotic synthesis.more » « less
-
Abstract Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as “grinding” and “heating”, “dissolving” and “centrifuging”, etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.
-
Abstract Ultrathin and 2D magnetic materials have attracted a great deal of attention recently due to their potential applications in spintronics. Only a handful of stable ultrathin magnetic materials have been reported, but their high‐yield synthesis remains a challenge. Transition metal (e.g., manganese) nitrides are attractive candidates for spintronics due to their predicted high magnetic transition temperatures. Here, a lattice matching synthesis of ultrathin Mn3N2is employed. Taking advantage of the lattice match between a KCl salt template and Mn3N2, this method yields the first ultrathin magnetic metal nitride via a solution‐based route. Mn3N2flakes show intrinsic magnetic behavior even at 300 K, enabling potential room‐temperature applications. This synthesis procedure offers an approach to the discovery of other ultrathin or 2D metal nitrides.
-
Chalcogenide perovskites are promising semiconductor materials with attractive optoelectronic properties and appreciable stability, making them enticing candidates for photovoltaics and related electronic applications. Traditional synthesis methods for these materials have long suffered from high‐temperature requirements of 800–1000 °C. However, the recently developed solution processing route provides a way to circumvent this. By utilizing barium thiolate and ZrH2, this method is capable of synthesizing BaZrS3perovskite at modest temperatures (500–600 °C), generating crystalline domains on the order of hundreds of nanometers in size. Herein, a systematic study of this solution processing route is done to gain a mechanistic understanding of the process and to supplement the development of device quality fabrication methodologies. A barium polysulfide liquid flux is identified as playing a key role in the rapid synthesis of large‐grain BaZrS3perovskite at modest temperatures. Additionally, this mechanism is successfully extended to the related BaHfS3perovskite. The reported findings identify viable precursors, key temperature regimes, and reaction conditions that are likely to enable the large‐grain chalcogenide perovskite growth, essential toward the formation of device‐quality thin films.
-
Abstract Materials discovery has become significantly facilitated and accelerated by high-throughput
ab-initio computations. This ability to rapidly design interesting novel compounds has displaced the materials innovation bottleneck to the development of synthesis routes for the desired material. As there is no a fundamental theory for materials synthesis, one might attempt a data-driven approach for predicting inorganic materials synthesis, but this is impeded by the lack of a comprehensive database containing synthesis processes. To overcome this limitation, we have generated a dataset of “codified recipes” for solid-state synthesis automatically extracted from scientific publications. The dataset consists of 19,488 synthesis entries retrieved from 53,538 solid-state synthesis paragraphs by using text mining and natural language processing approaches. Every entry contains information about target material, starting compounds, operations used and their conditions, as well as the balanced chemical equation of the synthesis reaction. The dataset is publicly available and can be used for data mining of various aspects of inorganic materials synthesis.