skip to main content


Title: polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics
Abstract

Polymers are a vital part of everyday life. Their chemical universe is so large that it presents unprecedented opportunities as well as significant challenges to identify suitable application-specific candidates. We present a complete end-to-end machine-driven polymer informatics pipeline that can search this space for suitable candidates at unprecedented speed and accuracy. This pipeline includes a polymer chemical fingerprinting capability called polyBERT (inspired by Natural Language Processing concepts), and a multitask learning approach that maps the polyBERT fingerprints to a host of properties. polyBERT is a chemical linguist that treats the chemical structure of polymers as a chemical language. The present approach outstrips the best presently available concepts for polymer property prediction based on handcrafted fingerprint schemes in speed by two orders of magnitude while preserving accuracy, thus making it a strong candidate for deployment in scalable architectures including cloud infrastructures.

 
more » « less
Award ID(s):
1941029
PAR ID:
10512267
Author(s) / Creator(s):
;
Publisher / Repository:
Springer Nature
Date Published:
Journal Name:
Nature Communications
Volume:
14
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Molecular search is important in chemistry, biology, and informatics for identifying molecular structures within large data sets, improving knowledge discovery and innovation, and making chemical data FAIR (findable, accessible, interoperable, reusable). Search algorithms for polymers are significantly less developed than those for small molecules because polymer search relies on searching by polymer name, which can be challenging because polymer naming is overly broad (i.e., polyethylene), complicated for complex chemical structures, and often does not correspond to official IUPAC conventions. Chemical structure search in polymers is limited to substructures, such as monomers, without awareness of connectivity or topology. This work introduces a novel query language and graph traversal search algorithm for polymers that provides the first search method able to fully capture all of the chemical structures present in polymers. The BigSMARTS query language, an extension of the small-molecule SMARTS language, allows users to write queries that localize monomer and functional group searches to different parts of the polymer, like the middle block of a triblock, the side chain of a graft, and the backbone of a repeat unit. The substructure search algorithm is based on the traversal of graph representations of the generating functions for the stochastic graphs of polymers. Operationally, the algorithm first identifies cycles representing the monomers and then the end groups and finally performs a depth-first search to match entire subgraphs. To validate the algorithm, hundreds of queries were searched against hundreds of target chemistries and topologies from the literature, with approximately 440,000 query–target pairs. This tool provides a detailed algorithm that can be implemented in search engines to provide search results with full matching of the monomer connectivity and polymer topology. 
    more » « less
  2. Abstract

    The accumulation of plastic waste, due to lack of recycling, has led to serious environmental pollution. Although mechanical recycling can alleviate this issue, it inevitably reduces the molecular weight and weakens the mechanical properties of materials and is not suitable for mixed materials. Chemical recycling, on the other hand, breaks the polymer into monomers or small‐molecule constituents, allowing for the preparation of materials of quality comparable to that of the virgin polymers and can be applied to mixed materials. Mechanochemical degradation and recycling leverages the advantages of mechanical techniques, such as scalability and efficient energy use, to achieve chemical recycling. We summarize recent progress in mechanochemical degradation and recycling of synthetic polymers, including both commercial polymers and those designed for more efficient mechanochemical degradation. We also point out the limitations of mechanochemical degradation and present our perspectives on how the challenges can be mitigated for a circular polymer economy.

     
    more » « less
  3. Abstract

    The accumulation of plastic waste, due to lack of recycling, has led to serious environmental pollution. Although mechanical recycling can alleviate this issue, it inevitably reduces the molecular weight and weakens the mechanical properties of materials and is not suitable for mixed materials. Chemical recycling, on the other hand, breaks the polymer into monomers or small‐molecule constituents, allowing for the preparation of materials of quality comparable to that of the virgin polymers and can be applied to mixed materials. Mechanochemical degradation and recycling leverages the advantages of mechanical techniques, such as scalability and efficient energy use, to achieve chemical recycling. We summarize recent progress in mechanochemical degradation and recycling of synthetic polymers, including both commercial polymers and those designed for more efficient mechanochemical degradation. We also point out the limitations of mechanochemical degradation and present our perspectives on how the challenges can be mitigated for a circular polymer economy.

     
    more » « less
  4. Abstract

    Polyolefins with periodic unsaturation in the backbone chain are sought after for synthesizing chemically recyclable polymers or telechelic polyolefin macromonomers. Here we introduce a bottom‐up synthesis of unsaturated high‐density polyethylene (HDPE) via copolymerization of ethylene with dimethyl 7‐oxabicyclo[2.2.1]hepta‐2,5‐diene‐3,5‐dicarboxylate followed by post‐polymerization retro‐Diels–Alder to unveil hidden double bonds in the polymer backbone. The incorporation of this “Trojan Horse” comonomer was varied and a series of unsaturated HDPE polymers with block lengths of 1.2, 1.9, and 3.5 kDa between double bonds was synthesized. Cross metathesis of unsaturated HDPE samples with 2‐hydroxyethyl acrylate yielded telechelic ester terminated PE macromonomers suitable for the preparation of ester‐linked PE. These materials were depolymerized and repolymerized, making them suitable candidates for chemical recycling. The ester‐linked PE displayed thermal and mechanical properties comparable to commercial HDPE.

     
    more » « less
  5. We propose a chemical language processing model to predict polymers’ glass transition temperature (Tg) through a polymer language (SMILES, Simplified Molecular Input Line Entry System) embedding and recurrent neural network. This model only receives the SMILES strings of a polymer’s repeat units as inputs and considers the SMILES strings as sequential data at the character level. Using this method, there is no need to calculate any additional molecular descriptors or fingerprints of polymers, and thereby, being very computationally efficient. More importantly, it avoids the difficulties to generate molecular descriptors for repeat units containing polymerization point ‘*’. Results show that the trained model demonstrates reasonable prediction performance on unseen polymer’s Tg. Besides, this model is further applied for high-throughput screening on an unlabeled polymer database to identify high-temperature polymers that are desired for applications in extreme environments. Our work demonstrates that the SMILES strings of polymer repeat units can be used as an effective feature representation to develop a chemical language processing model for predictions of polymer Tg. The framework of this model is general and can be used to construct structure–property relationships for other polymer properties. 
    more » « less