Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Polymer–protein hybrids are intriguing materials that can bolster protein stability in non‐native environments, thereby enhancing their utility in diverse medicinal, commercial, and industrial applications. One stabilization strategy involves designing synthetic random copolymers with compositions attuned to the protein surface, but rational design is complicated by the vast chemical and composition space. Here, a strategy is reported to design protein‐stabilizing copolymers based on active machine learning, facilitated by automated material synthesis and characterization platforms. The versatility and robustness of the approach is demonstrated by the successful identification of copolymers that preserve, or even enhance, the activity of three chemically distinct enzymes following exposure to thermal denaturing conditions. Although systematic screening results in mixed success, active learning appropriately identifies unique and effective copolymer chemistries for the stabilization of each enzyme. Overall, this work broadens the capabilities to design fit‐for‐purpose synthetic copolymers that promote or otherwise manipulate protein activity, with extensions toward the design of robust polymer–protein hybrid materials.more » « less
-
Abstract Among the many molecules that contribute to glial scarring, chondroitin sulfate proteoglycans (CSPGs) are known to be potent inhibitors of neuronal regeneration. Chondroitinase ABC (ChABC), a bacterial lyase, degrades the glycosaminoglycan (GAG) side chains of CSPGs and promotes tissue regeneration. However, ChABC is thermally unstable and loses all activity within a few hours at 37 °C under dilute conditions. To overcome this limitation, the discovery of a diverse set of tailor‐made random copolymers that complex and stabilize ChABC at physiological temperature is reported. The copolymer designs, which are based on chain length and composition of the copolymers, are identified using an active machine learning paradigm, which involves iterative copolymer synthesis, testing for ChABC thermostability upon copolymer complexation, Gaussian process regression modeling, and Bayesian optimization. Copolymers are synthesized by automated PET‐RAFT and thermostability of ChABC is assessed by retained enzyme activity (REA) after 24 h at 37 °C. Significant improvements in REA in three iterations of active learning are demonstrated while identifying exceptionally high‐performing copolymers. Most remarkably, one designed copolymer promotes residual ChABC activity near 30%, even after one week and notably outperforms other common stabilization methods for ChABC. Together, these results highlight a promising pathway toward sustained tissue regeneration.more » « less
-
Phase separation in multicomponent mixtures is of significant interest in both fundamental research and technology. Although the thermodynamic principles governing phase equilibria are straightforward, practical determination of equilibrium phases and constituent compositions for multicomponent systems is often laborious and computationally intensive. Here, we present a machine-learning workflow that simplifies and accelerates phase-coexistence calculations. We specifically analyze capabilities of neural networks to predict the number, composition, and relative abundance of equilibrium phases of systems described by Flory-Huggins theory. We find that incorporating physics-informed material constraints into the neural network architecture enhances the prediction of equilibrium compositions compared to standard neural networks with minor errors along the boundaries of the stable region. However, introducing additional physics-informed losses does not lead to significant further improvement. These errors can be virtually eliminated by using machine-learning predictions as a warm-start for a subsequent optimization routine. This work provides a promising pathway to efficiently characterize multicomponent phase coexistence.more » « lessFree, publicly-accessible full text available December 24, 2025
-
Phase separation in multicomponent mixtures is of significant interest in both fundamental research and technology. Although the thermodynamic principles governing phase equilibria are straightforward, practical determination of equilibrium phases and constituent compositions for multicomponent systems is often laborious and computationally intensive. Here, we present a machine-learning workflow that simplifies and accelerates phase-coexistence calculations. We specifically analyze capabilities of neural networks to predict the number, composition, and relative abundance of equilibrium phases of systems described by Flory-Huggins theory. We find that incorporating physics-informed material constraints into the neural network architecture enhances the prediction of equilibrium compositions compared to standard neural networks with minor errors along the boundaries of the stable region. However, introducing additional physics-informed losses does not lead to significant further improvement. These errors can be virtually eliminated by using machine-learning predictions as a warm-start for a subsequent optimization routine. This work provides a promising pathway to efficiently characterize multicomponent phase coexistence.more » « lessFree, publicly-accessible full text available September 23, 2025
-
Active learning and design-build-test-learn strategies are increasingly employed to accelerate materials discovery and characterization. Many data-driven materials design campaigns target solutions within constrained domains such as synthesizability, stability, solubility, recyclability, and toxicity. Lack of knowledge about these constraints can hinder design efficiency by producing samples that fail to meet required thresholds. Acquiring this knowledge during the design campaign is inefficient, and effective classification of common materials constraints transcends specific design objectives. However, there is no consensus on the most data-efficient algorithm for classifying whether a material satisfies a constraint. To address this gap, we comprehensively compare the performance of 100 strategies designed to classify chemical and materials behavior. Performance is assessed across 31 classification tasks sourced from the literature in chemical and materials science. From these results, we recommend best practices for building data-efficient classifiers, showing the neural network- and random forest-based active learning algorithms are most efficient across tasks. We also show that classification task complexity can be quantified based on task metafeatures, most notably the noise-to-signal ratio. Overall, this work provides a comprehensive survey of data-efficient classification strategies, identifies attributes of top-performing strategies, and suggests avenues for further study.more » « lessFree, publicly-accessible full text available September 16, 2025
-
This dataset holds 1036 ternary phase diagrams and how points on the diagram phase separate if they do. The data is provided as a serialized object using the `pickle' Python module. The data was compiled using Python version 3.8. ReferencesThe specific applications and analyses of the data are described in 1. Dhamankar, S.; Jiang, S.; Webb, M.A. "Accelerating Multicomponent Phase-Coexistence Calculations with Physics-informed Neural Networks" UsageTo access the data in the .pickle file, users can execute the following: # LOAD SIMULATION DATADATA_DIR = "your/custom/dir/" filename = os.path.join(DATA_DIR, f"data_clean.pickle")with open(filename, "rb") as handle: (x, y_c, y_r, phase_idx, num_phase, max_phase) = pickle.load(handle) x: Input x = (χ_AB, χ_BC, χ_AC, v_A, v_B, v_C, φ_A, φ_B) ∈ ℝ^8. y_c: Output one-hot encoded classification vector y_c ∈ ℝ^3. y_r: Output equilibrium composition and abundance vector y_r = (φ_A^α, φ_B^α, φ_A^β, φ_B^β, φ_A^γ, φ_B^γ, w^α, w^β, w^γ) ∈ ℝ^9. phase_idx: A single integer indicating which unique phase system it belongs to. num_phase: A single integer indicates the number of equilibrium phases the input splits into. max_phase: A single integer indicates the maximum number of equilibrium phases the system splits into. Help, Suggestions, Corrections?If you need help, have suggestions, identify issues, or have corrections, please send your comments to Shengli Jiang at sj0161@princeton.edu GitHubAdditional data and code relevant for this study is additionally accessible at hthttps://github.com/webbtheosim/ml-ternary-phasemore » « less
-
The emergence of data-intensive scientific discovery and machine learning has dramatically changed the way in which scientists and engineers approach materials design. Nevertheless, for designing macromolecules or polymers, one limitation is the lack of appropriate methods or standards for converting systems into chemically informed, machine-readable representations. This featurization process is critical to building predictive models that can guide polymer discovery. Although standard molecular featurization techniques have been deployed on homopolymers, such approaches capture neither the multiscale nature nor topological complexity of copolymers, and they have limited application to systems that cannot be characterized by a single repeat unit. Herein, we present, evaluate, and analyze a series of featurization strategies suitable for copolymer systems. These strategies are systematically examined in diverse prediction tasks sourced from four distinct datasets that enable understanding of how featurization can impact copolymer property prediction. Based on this comparative analysis, we suggest directly encoding polymer size in polymer representations when possible, adopting topological descriptors or convolutional neural networks when the precise polymer sequence is known, and using chemically informed unit representations when developing extrapolative models. These results provide guidance and future directions regarding polymer featurization for copolymer design by machine learning.more » « less