skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: TopoSolv-6k: Radii of Gyration and Shear-Rate Dependent Viscosities for Topologically and Chemically Diverse Coarse-Grained Polymers
Dataset Description This dataset contains 6710 structural configurations and solvophobicity values for topologically and chemically diverse coarse-grained polymer chains. Additionally, 480 polymers include shear-rate dependent viscosity profiles at 2 wt% polymer concentration.The data is provided as serialized objects using the pickle Python module.All files were generated using Python version 3.10. Data There are three pickle files containing serialized Python objects. Key files include: data_aug10.pickle  Contains the coarse-grained polymer dataset with 6710 entries.  Each entry includes: Polymer graph Squared radius of gyration (at lambda = 0). Solvophobicity (lambda). Bead count (N). Chain virial number (Xi). topo_param_visc.pickle   Shear-rate-dependent viscosity profiles of 480 polymer systems. target_curves.pickle  Contains 30 target viscosity profiles used for active learning. Usage To load the dataset stored in data_aug10.pickle, use the following code: import pickle with open("data_aug10.pickle", "rb") as handle:    (        (x_train, y_train, c_train, l_train, graph_train),        (x_valid, y_valid, c_valid, l_valid, graph_valid),        (x_test, y_test, c_test, l_test, graph_test),        NAMES,        SCALER,        SCALER_y,        le    ) = pickle.load(handle) x: node features for each polymer graph y: labels (e.g., predicted properties) c: topological class indices l: topological descriptors graph: NetworkX graphs representing polymer topology NAMES: list of topological class names SCALER: fitted scaler for topological descriptors (l) SCALER_y: fitted scaler for property labels (y) le: label encoder for topological class indices   To load the dataset stored in topo_param_visc.pickle, use the following code: import pickle with open("poly_data_ml.pickle", "rb") as handle:    desc_all, ps_all, curve_all, shear_rate, graph_all = pickle.load(handle) desc_all: topological descriptors for each polymer graph ps_all: fitted Carreau–Yasuda model parameters curve_all: fitted viscosity curves shear_rate: shear rates corresponding to each viscosity curve graph_all: polymer graphs represented as NetworkX objects   First 30: seed dataset Next 150: 5 iterations (30 each) from class-balanced space-filling Following 150: space-filling without class balancing Final 150: active learning samples    To load the dataset stored in target_curves.pickle, use the following code: import pickle with open("target_curves.pickle", "rb") as handle:    data = pickle.load(f) curves = data['curves']params = data['params']shear_rate = data["xx"]   curves: target viscosity curves used as design objectives params: Carreau–Yasuda model parameters fitted to the target curves shear_rate: shear rate values associated with the target curves     Help, Suggestions, Corrections?If you need help, have suggestions, identify issues, or have corrections, please send your comments to Shengli Jiang at sj0161@princeton.edu GitHubAdditional data and code relevant for this study is additionally accessible at https://github.com/webbtheosim/cg-topo-solv  more » « less
Award ID(s):
2320649
PAR ID:
10636750
Author(s) / Creator(s):
;
Publisher / Repository:
Zenodo
Date Published:
Format(s):
Medium: X
Right(s):
Creative Commons Attribution 4.0 International
Sponsoring Org:
National Science Foundation
More Like this
  1. Revision: This revision includes four independent trajectory values of the ensemble averages of the mean squared radii of gyration and their standard deviations, which can be used to compute statistical measures such as the standard error. This distribution provides access to 18,450 configurations of coarse-grained polymers. The data is provided as a serialized object using the `pickle' Python module and in csv format. The data was compiled using Python version 3.8.  ReferencesThe specific applications and analyses of the data are described in 1.  Jiang, S.; Webb, M.A. "Physics-Guided Neural Networks for Transferable Prediction of Polymer Properties" DataThere are seven .pickle files that contain serialized Python objects. pattern_graph_data_*_*_rg_new.pickle: squared radii of gyration distribution from MD simulation. The number indicates the molecular weight range. rg2_baseline_*_new.pickle: squared radii of gyration distribution from Gaussian chain theoretical prediction. delta_data_v0314.pickle: torch_geometric training data. UsageTo access the data in the .pickle file, users can execute the following: # LOAD SIMULATION DATADATA_DIR = "your/custom/dir/"mw = 40 # or 90, 190 MWs filename = os.path.join(DATA_DIR, f"pattern_graph_data_{mw}_{mw+20}_rg_new.pickle")with open(filename, "rb") as handle:    graph = pickle.load(handle)    label = pickle.load(handle)    desc  = pickle.load(handle)    meta  = pickle.load(handle)    mode  = pickle.load(handle)    rg2_mean   = pickle.load(handle)    rg2_std    = pickle.load(handle) ** 0.5 # var # combine asymmetric and symmetric star polymerslabel[label == 'stara'] = 'star'# combine bottlebrush and other comb polymerslabel[label == 'bottlebrush'] = 'comb'  # LOAD GAUSSIAN CHAIN THEORETICAL DATAwith open(os.path.join(DATA_DIR, f"rg2_baseline_{mw}_new.pickle"), "rb") as handle:    rg2_mean_theo = pickle.load(handle)[:, 0]    rg2_std_theo = pickle.load(handle)[:, 0] graph: NetworkX graph representations of polymers. label: Architectural classes of polymers (e.g., linear, cyclic, star, branch, comb, dendrimer). desc: Topological descriptors (optional). meta: Identifiers for unique architectures (optional). mode: Identifiers for unique chemical patterns (optional). rg2_mean: Mean squared radii of gyration from simulations. rg2_std: Corresponding standard deviation from simulations. rg2_mean_theo: Mean squared radii of gyration from theoretical models. rg2_std_theo: Corresponding standard deviation from theoretical models. Help, Suggestions, Corrections?If you need help, have suggestions, identify issues, or have corrections, please send your comments to Shengli Jiang at sj0161@princeton.edu GitHubAdditional data and code relevant for this study is additionally accessible at https://github.com/webbtheosim/gcgnn 
    more » « less
  2. This dataset holds 1036 ternary phase diagrams and how points on the diagram phase separate if they do. The data is provided as a serialized object using the `pickle' Python module. The data was compiled using Python version 3.8.  ReferencesThe specific applications and analyses of the data are described in 1.  Dhamankar, S.; Jiang, S.; Webb, M.A. "Accelerating Multicomponent Phase-Coexistence Calculations with Physics-informed Neural Networks" UsageTo access the data in the .pickle file, users can execute the following: # LOAD SIMULATION DATADATA_DIR = "your/custom/dir/" filename = os.path.join(DATA_DIR, f"data_clean.pickle")with open(filename, "rb") as handle:    (x, y_c, y_r, phase_idx, num_phase, max_phase) = pickle.load(handle) x: Input x = (χ_AB, χ_BC, χ_AC, v_A, v_B, v_C, φ_A, φ_B) ∈ ℝ^8. y_c: Output one-hot encoded classification vector y_c ∈ ℝ^3. y_r: Output equilibrium composition and abundance vector y_r = (φ_A^α, φ_B^α, φ_A^β, φ_B^β, φ_A^γ, φ_B^γ, w^α, w^β, w^γ) ∈ ℝ^9. phase_idx: A single integer indicating which unique phase system it belongs to. num_phase: A single integer indicates the number of equilibrium phases the input splits into. max_phase: A single integer indicates the maximum number of equilibrium phases the system splits into. Help, Suggestions, Corrections?If you need help, have suggestions, identify issues, or have corrections, please send your comments to Shengli Jiang at sj0161@princeton.edu GitHubAdditional data and code relevant for this study is additionally accessible at hthttps://github.com/webbtheosim/ml-ternary-phase 
    more » « less
  3. Abstract The viscosity of fluids and their dependence on shear rate, known as shear thinning, plays a critical role in applications ranging from lubricants and coatings to biomedical and food-processing industries. Traditional models such as the Carreau and Eyring theories offer competing explanations for shear-thinning behavior. The Carreau model attributes viscosity reduction to molecular distortions, while the Eyring model describes shear thinning as a stress-induced transition over an activation energy barrier. This work proposes an extended-Eyring model that incorporates stress-dependent activation volumes, bridging key aspects of both theories. In modifying transition-state theory by using an Evans-Polanyi perturbation analysis, we derive a generalized viscosity equation that accounts for the molecular-scale rearrangements governing fluid flow. The model is validated against computational and experimental data, including shear-thinning behavior of pure squalane and polyethylene oxide (PEO) aqueous solutions. Comparative analysis with Carreau-Yasuda and conventional Eyring models demonstrates excellent accuracy in predicting viscosity trends over a wide range of shear rates. The introduction of stress-dependent activation volumes provides a description of molecular exchange kinetics accounting for structural reorganization under shear. These findings offer a unified framework for modeling shear thinning and have broad implications for designing advanced lubricants, polymer solutions, and complex fluids with tailored flow properties. Graphical Abstract 
    more » « less
  4. {"Abstract":["The intended use of this archive is to facilitate meta-analysis of the Data Observation Network for Earth (DataONE, [1]). <\/p>\n\nDataONE is a distributed infrastructure that provides information about earth observation data. This dataset was derived from the DataONE network using Preston [2] between 17 October 2018 and 6 November 2018, resolving 335,213 urls at an average retrieval rate of about 5 seconds per url, or 720 files per hour, resulting in a data gzip compressed tar archive of 837.3 MB .  <\/p>\n\nThe archive associates 325,757 unique metadata urls [3] to 202,063 unique ecological metadata files [4]. Also, the DataONE search index was captured to establish provenance of how the dataset descriptors were found and acquired. During the creation of the snapshot (or crawl), 15,389 urls [5], or 4.7% of urls, did not successfully resolve. <\/p>\n\nTo facilitate discovery, the record of the Preston snapshot crawl is included in the preston-ls-* files . There files are derived from the rdf/nquad file with hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f . This file can also be found in the data.tar.gz at data/8c/67/e0/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f/data . For more information about concepts and format, please see [2]. <\/p>\n\nTo extract all EML files from the included Preston archive, first extract the hashes assocated with EML files using:<\/p>\n\ncat preston-ls.tsv.gz | gunzip | grep "Version" | grep -v "deeplinker" | grep -v "query/solr" | cut -f1,3 | tr '\\t' '\\n' | grep "hash://" | sort | uniq > eml-hashes.txt<\/p>\n\nextract data.tar.gz using:<\/p>\n\n~/preston-archive$$ tar xzf data.tar.gz <\/p>\n\nthen use Preston to extract each hash using something like:<\/p>\n\n~/preston-archive$$ preston get hash://sha256/00002d0fc9e35a9194da7dd3d8ce25eddee40740533f5af2397d6708542b9baa\n<eml:eml xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml_1.1" packageId="doi:10.18739/A24P9Q" system="https://arcticdata.io" scope="system" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 ~/development/eml/eml.xsd">\n  <dataset>\n    <alternateIdentifier>urn:x-wmo:md:org.aoncadis.www::d76bc3b5-7b19-11e4-8526-00c0f03d5b7c</alternateIdentifier>\n    <alternateIdentifier>d76bc3b5-7b19-11e4-8526-00c0f03d5b7c</alternateIdentifier>\n    <title>Airglow Image Data 2011 4 of 5</title>\n...<\/p>\n\nAlternatively, without using Preston, you can extract the data using the naming convention:<\/p>\n\ndata/[x]/[y]/[z]/[hash]/data<\/p>\n\nwhere x is the first 2 characters of the hash, y the second 2 characters, z the third 2 characters, and hash the full sha256 content hash of the EML file.<\/p>\n\nFor example, the hash hash://sha256/00002d0fc9e35a9194da7dd3d8ce25eddee40740533f5af2397d6708542b9baa can be found in the file: data/00/00/2d/00002d0fc9e35a9194da7dd3d8ce25eddee40740533f5af2397d6708542b9baa/data . For more information, see [2].<\/p>\n\nThe intended use of this archive is to facilitate meta-analysis of the DataONE dataset network. <\/p>\n\n[1] DataONE, https://www.dataone.org\n[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 . DataONE was crawled via Preston with "preston update -u https://dataone.org".\n[3] cat preston-ls.tsv.gz | gunzip | grep "Version" | grep -v "deeplinker" | grep -v "query/solr" | cut -f1,3 | tr '\\t' '\\n' | grep -v "hash://" | sort | uniq | wc -l\n[4] cat preston-ls.tsv.gz | gunzip | grep "Version" | grep -v "deeplinker" | grep -v "query/solr" | cut -f1,3 | tr '\\t' '\\n' | grep "hash://" | sort | uniq | wc -l\n[5] cat preston-ls.tsv.gz | gunzip | grep "Version" | grep  "deeplinker" | grep -v "query/solr" | cut -f1,3 | tr '\\t' '\\n' | grep -v "hash://" | sort | uniq | wc -l<\/p>\n\nThis work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.<\/p>"]} 
    more » « less
  5. {"Abstract":["MCMC chains for the GWB analyses performed in the paper "The NANOGrav 15 yr Data Set: Search for Signals from New Physics<\/em>". <\/p>\n\nThe data is provided in pickle format. Each file contains a NumPy array with the MCMC chain (with burn-in already removed), and a dictionary with the model parameters' names as keys and their priors as values. You can load them as<\/p>\n\nwith open ('path/to/file.pkl', 'rb') as pick:\n temp = pickle.load(pick)\n\n params = temp[0]\n chain = temp[1]<\/code>\n\nThe naming convention for the files is the following:<\/p>\n\nigw<\/strong>: inflationary Gravitational Waves (GWs)<\/li>sigw: scalar-induced GWs\n\tsigw_box<\/strong>: assumes a box-like feature in the primordial power spectrum.<\/li>sigw_delta<\/strong>: assumes a delta-like feature in the primordial power spectrum.<\/li>sigw_gauss<\/strong>: assumes a Gaussian peak feature in the primordial power spectrum.<\/li><\/ul>\n\t<\/li>pt: cosmological phase transitions\n\tpt_bubble<\/strong>: assumes that the dominant contribution to the GW productions comes from bubble collisions.<\/li>pt_sound<\/strong>: assumes that the dominant contribution to the GW productions comes from sound waves.<\/li><\/ul>\n\t<\/li>stable: stable cosmic strings\n\tstable-c<\/strong>: stable strings emitting GWs only in the form of GW bursts from cusps on closed loops.<\/li>stable-k<\/strong>: stable strings emitting GWs only in the form of GW bursts from kinks on closed loops.<\/li>stable<\/strong>-m<\/strong>: stable strings emitting monochromatic GW at the fundamental frequency.<\/li>stable-n<\/strong>: stable strings described by numerical simulations including GWs from cusps and kinks.<\/li><\/ul>\n\t<\/li>meta: metastable cosmic strings\n\tmeta<\/strong>-l<\/strong>: metastable strings with GW emission from loops only.<\/li>meta-ls<\/strong> metastable strings with GW emission from loops and segments.<\/li><\/ul>\n\t<\/li>super<\/strong>: cosmic superstrings.<\/li>dw: domain walls\n\tdw-sm<\/strong>: domain walls decaying into Standard Model particles.<\/li>dw-dr<\/strong>: domain walls decaying into dark radiation.<\/li><\/ul>\n\t<\/li><\/ul>\n\nFor each model, we provide four files. One for the run where the new-physics signal is assumed to be the only GWB source. One for the run where the new-physics signal is superimposed to the signal from Supermassive Black Hole Binaries (SMBHB), for these files "_bhb" will be appended to the model name. Then, for both these scenarios, in the "compare" folder we provide the files for the hypermodel runs that were used to derive the Bayes' factors.<\/p>\n\nIn addition to chains for the stochastic models, we also provide data for the two deterministic models considered in the paper (ULDM and DM substructures). For the ULDM model, the naming convention of the files is the following (all the ULDM signals are superimposed to the SMBHB signal, see the discussion in the paper for more details)<\/p>\n\nuldm_e<\/strong>: ULDM Earth signal.<\/li>uldm_p: ULDM pulsar signal\n\tuldm_p_cor<\/strong>: correlated limit<\/li>uldm_p_unc<\/strong>: uncorrelated limit<\/li><\/ul>\n\t<\/li>uldm_c: ULDM combined Earth + pulsar signal direct coupling \n\tuldm_c_cor<\/strong>: correlated limit<\/li>uldm_c_unc<\/strong>: uncorrelated limit<\/li><\/ul>\n\t<\/li>uldm_vecB: vector ULDM coupled to the baryon number\n\tuldm_vecB_cor:<\/strong> correlated limit<\/li>uldm_vecB_unc<\/strong>: uncorrelated limit <\/li><\/ul>\n\t<\/li>uldm_vecBL: vector ULDM coupled to B-L\n\tuldm_vecBL_cor:<\/strong> correlated limit<\/li>uldm_vecBL_unc<\/strong>: uncorrelated limit<\/li><\/ul>\n\t<\/li>uldm_c_grav: ULDM combined Earth + pulsar signal for gravitational-only coupling\n\tuldm_c_grav_cor: correlated limit\n\t\tuldm_c_cor_grav_low<\/strong>: low mass region  <\/li>uldm_c_cor_grav_mon<\/strong>: monopole region<\/li>uldm_c_cor_grav_low<\/strong>: high mass region<\/li><\/ul>\n\t\t<\/li>uldm_c_unc<\/strong>: uncorrelated limit\n\t\tuldm_c_unc_grav_low<\/strong>: low mass region  <\/li>uldm_c_unc_grav_mon<\/strong>: monopole region<\/li>uldm_c_unc_grav_low<\/strong>: high mass region<\/li><\/ul>\n\t\t<\/li><\/ul>\n\t<\/li><\/ul>\n\nFor the substructure (static) model, we provide the chain for the marginalized distribution (as for the ULDM signal, the substructure signal is always superimposed to the SMBHB signal)<\/p>"]} 
    more » « less