{"Abstract":[" LUMINEX Wildlife Disease Analysis Pipeline\n\n Overview\n\n Bayesian analysis pipeline for "Cohorts of immature Pteropus bats show interannual variation in Hendra virus serology"\n\nSummary \n\n Prerequisites\n\n Software Requirements\n\n - R (≥ 4.0.0) - Stan (≥ 2.21)\n\n Required R Packages\n\n # Core packages install.packages(c("rstan", "tidyverse", "here", "loo", "bayesplot")) # Additional packages install.packages(c("RColorBrewer", "cowplot", "boot", "see", "factoextra", "bestNormalize", "LaplacesDemon", "ggpubr", "plyr", "see", "pscl")) # Core utility functions (CRITICAL DEPENDENCY) source(here("R", "useful_functions.R")) # Bayesian analysis functions source(here("R", "paper_theme.R")) # Plotting themes Hardware Requirements\n\n - Storage: ≥5GB free space - CPU: Multi-core processor recommended\n\n Data Requirements\n\n /raw_sharable folder\n\n Execution Workflow\n\n ⚠️ CRITICAL: Execute in This Order\n\n Phase 1: Core Data Processing\n\n 1. Initial Data Cleaning source("R/create_datasets_for_stan_part_1.R") - Runtime: ~5-10 minutes - Outputs: data_for_cohort_model.csv, luminex_igg_igm.csv 2. Serology Classification rmarkdown::render("R/mixture_model_final.R") - Runtime: ~30-60 minutes (Stan model fitting) - Outputs: serology_prob.csv 3. Age/Cohort Assignment rmarkdown::render("R/cohort_model_2025_05_12.Rmd") - Runtime: ~1-3 hours (complex Stan models) - Outputs: age_predictions.csv & a species comparison analysis 4. Dataset Integration source("R/create_datasets_for_stan_part_2_06_30_24.R") - Runtime: ~10-15 minutes - Outputs: All analysis-ready datasets with time-alive variables\n\n Phase 2: Primary Analyses\n\n 5. Prevalence Smoothing Analysis rmarkdown::render("R/gaussian_smooth_prevalence_06_30_24.Rmd") - Runtime: ~10+ hours (multiple Stan models) - Key outputs: Prevalence curves, model comparisons - Additional outputs: -Basic prevalence smoothing (4 pathogens × 4 cohorts) -Site-specific analysis (8-cohort models) -Sex-stratified analysis -Stringent cohort cutoff analysis -Multiple cutoff threshold testing (5 different cutoffs) -Date-based modeling -Batch effect testing -Adult vs juvenile comparisons 6. Logistic Regression Analysis source("R/logistic_models_fig_3.R") source("R/logistic_models_fig_2.R") - Runtime: ~30-60 minutes - Outputs: Figure 2 & 3 plots\n\n Phase 3: Supporting Analyses (Optional)\n\n 7. Additional Analyses (run as needed): - adult_prevalence_curves.Rmd - Adult dynamics - PCA_new_analysis.R - Multivariate analysis - additional_figures.R - Supplementary figures\n\n Key Functions (useful_functions.R)\n\n - fit_4cohort_model() - Bayesian 4-cohort prevalence model - compile_stan_results() - Extract and format Stan results - create_time_sequence() - Generate prediction timepoints - compute_loo_cv() - Leave-one-out cross-validation - plot_parameter_diagnostics() - Model diagnostic plots\n\n Troubleshooting\n\n Common Issues\n\n Stan compilation errors: # Recompile Stan models rstan_options(auto_write = TRUE) options(mc.cores = parallel::detectCores())\n\n Memory issues: - Reduce Stan iterations: ITER = 1000 instead of ITER = 2000 - Run analyses sequentially, not in parallel\n\n Missing dependencies: # Load all utility functions source(here("R", "useful_functions.R")) source(here("R", "paper_theme.R"))\n\n Resume from Saved Results\n\n Many scripts save intermediate results: # Check for existing model fits if(file.exists("model_results.RData")) { load("model_results.RData") } else { # Run full analysis }\n\n Output Structure\n\n Luminex_figs/r_figs/ # All generated figures Data_for_publication/ # Final analysis datasets stan/ # Stan model files R/ # Analysis scripts\n\n Expected Runtime\n\n Full pipeline: 10-20 hours on modern hardware \n\n \n\n "]}
more »
« less
Dataset: Coral high molecular weight carbohydrates support opportunistic microbes in bacterioplankton from an algae-dominated reef
This dataset contains raw data for figures 5 (genus-level microbial community compositions) and 6 (predicted metabolic functions, pathway types), R code for PERMANOVAs (Table 3), DESeq2 and random forest (rfpermute) analyses, and R code to generate figures 5, 6b, S5 & S6. Overview of .txt files: Genus_16S_Counts.txt Counts data used for DESeq2 analysis (Fig. 5c). Genus_16S_relAbund.txt Relative abundance data used for Fig. 5a, b & d. MicFunPred_MetaCyc_types_all Predicted pathway abundance data for all pathway types used for DESeq2 (Fig. 6b), PERMANOVA (Table 3) and column clustering of Fig. 6b. MicFunPred_MetaCyc_AA_types.txt Amino acids (Fig. 6b) MicFunPred_MetaCyc_CH_types.txt Carbohydrates (Fig. 6b) MicFunPred_MetaCyc_EM _types.txt Energy metabolism (Fig. 6b) MicFunPred_MetaCyc_FAL _types.txt Fatty acids and lipids (Fig. 6b) MicFunPred_MetaCyc_SM _types.txt Secondary metabolism (Fig. 6b) MicFunPred_MetaCyc_OBiosyn _types.txt Other biosynthesis (Fig. S6) MicFunPred_MetaCyc_ODeg _types.txt Other degradation (Fig. S6)
more »
« less
- Award ID(s):
- 2023298
- PAR ID:
- 10662925
- Publisher / Repository:
- Zenodo
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Replication Data for: Physics potential of the IceCube Upgrade for atmospheric neutrino oscillations{"Abstract":["Data and plotting scripts for reproducing plots from "Physics potential of the IceCube Upgrade for atmospheric neutrino oscillations". Please refer to the related publication for a detailed explanation of the sample and analysis.\n<br><br>\nContents: \n<ol>\n<li> <code>README.md</code> contains useful information about the contents of this data release.</li>\n<li> <code>example.py</code> Example script demonstrating how to load the csv files, plot the chi squared map, and extract the 90% sensitivity contours shown in Figures 11 and 15.</li>\n<li> <code>modchi2map_nufitwoSK.csv</code> chi2 map used to produce Figure 11 (left), in which the injected truth point is the best fit point from <a href="http://www.nu-fit.org/sites/default/files/v52.tbl-parameters.pdf">NuFit 5.2 w/o SK (upper panel)</a>, sin<sup>2</sup>(θ<sub>23</sub>)=0.572 and Δm<sup>2</sup><sub>32</sub>=2.43×10<sup>-3</sup> eV<sup>2</sup>.</li>\n<li> <code>modchi2map_nufitwSK.csv</code> chi2 map used to produce Figure 11 (right), in which the injected truth point is the best fit point from <a href="http://www.nu-fit.org/sites/default/files/v52.tbl-parameters.pdf">NuFit 5.2 w/ SK (lower panel)</a>, sin<sup>2</sup>(θ<sub>23</sub>)=0.451 and Δm<sup>2</sup><sub>32</sub>=2.43×10<sup>-3</sup> eV<sup>2</sup>.</li>\n<li> <code>modchi2map_icecube.csv</code> chi2 map used to produce Figure 15, in which the injected truth point is the best fit point from the latest published IceCube DeepCore oscillation result <a href="https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.134.091801">Phys. Rev. Lett. 134, 091801 (2025)</a>, sin<sup>2</sup>(θ<sub>23</sub>)=0.54 and Δm<sup>2</sup><sub>32</sub>=2.40×10<sup>-3</sup> eV<sup>2</sup>.</li>\n</ol>\n\nPlease note: The CSV files are available for download through dataverse in either csv or tab format."]}more » « less
-
Authors: Wesley J. Sparagon, Milou G.I. Arts, Zachary Quinlan3, Irina Koester, Jacqueline Comstock, Jessica A. Bullington, Craig A. Carlson, Pieter C. Dorrestein, Lihini I. Aluwihare, Linda Wegley Kelly, Andreas F. Haas and Craig E. Nelson Contains relevant raw data, R Code for analysis and figure generation, and output data frames used for figures.more » « less
-
DAMP21ka.nc: NetCDF file containing the model prior, proxy values, and DAMP21ka reconstruction for lake status, precipitation, and temperature variables.\n\nclhancock/DAMP21ka-v1.0.0.zip: Notebooks used to generate figures for Hancock et al. (2024)\n\nHolocene-code_development_hydroclimate.zip: Code used to generate the DAMP21ka reconstruction \n\n \n\nHancock, C. L., Erb, M. P., McKay, N. P., Dee, S. G., and Ivanovic, R.: A global Data Assimilation of Moisture Patterns from 21,000–0 BP (DAMP-21ka) using lake level proxy records"more » « less
-
This dataset includes statistically resampled monthly time series data of Arctic sea ice area and gridded data for March and September for sea ice concentration for a selection of large ensemble climate models and observational datasets. Arctic sea ice concentrations and areas are resampled from all available members of six coupled climate models from the Coupled Model Intercomparison Project 5 (CMIP5). These six models are: The second generation Canadian Earth System Model (CanESM2), The Community Earth System Mode version 1 (CESM1), The Commonwealth Scientific and Industrial Research Organisation Global Climate Model Mark 3.6 (CSIRO MK3.6), The Geophysical Fluid Dynamics Laboratory Coupled Climate Model version 3 (GFDL CM3), Geophysical Fluid Dynamics Laboratory Earth System Model version 2 with Modular Ocean Model version 4.1 (GFDL ESM2M), Max Planck Institute Earth System Model version 1 (MPI ESM1). The Four observational datasets are The Hadley Centre Sea Ice and Sea Surface Temperature data set version 1 (HadISST1), The National Oceanic and Atmospheric Administration and National Snow and Ice Data Center Climate Data Record Version 4 (CDR), The The National Aeronautics and Space Administration Team Algorithm (NT), and the The National Aeronautics and Space Administration Bootstrap Team Algorithm (BT). The sea ice area data is resampled 10,000 times and then the standard deviation of those resamplings is calculated, which can be considered analagous to interannual variability of sea ice area (SIA). The standard deviation (sigma) and mean (mu) of these data represent the variability and typical values respectively of interannual variability found in each ensemble member or observational dataset. Sea ice concentration is resampled 1000 times with the same standard deviation and mean metrics for sea ice concentration. This dataset was created to evaluate climate model projections of Arctic sea ice interannual variability and is used in the article Wyburn-Powell, Jahn, England (2022), Modeled Interannual Variability of Arctic Sea Ice Cover is Within Observational Uncertainty, Journal of Climate, https://doi.org/10.1175/JCLI-D-21-0958.1. This work was conducted at the University of Colorado Boulder from 2020-2022. The figures from the Journal of Climate article can be reproduced from the following datasets. The code used to create the datasets can be located at https://www.doi.org/10.5281/zenodo.6687725. - Figure 1: Sigma_obs_SIA.nc - Figure 2: Sigma_obs_SIA.nc, Mu_obs_SIA.nc, Sigma_mem_SIA.nc, Mu_mem_SIA.nc - Figure 3: Sigma_mem_varying_time_periods_1965_2066_03.nc, Sigma_LE_varying_time_periods_1965_2066_03.nc, Sigma_LE_varying_time_periods_1970_2040_09.nc, Sigma_obs_varying_time_periods_1953_2020.nc - Figure 4: Sigma_obs_SIA.nc, Sigma_mem_SIA.nc - Figure 5: Sigma_obs_SIA.nc - Figure 6: <model_name>_resampled_0<month>_individual.nc, <observational_dataset>_resampled_individual_1979_2020_03_09.nc - Figure 7: Sigma_obs_SIA.nc, Mu_obs_SIA.nc, Sigma_mem_SIA.nc, Mu_mem_SIA.nc - Figure 8: <model_name>_resampled_0<month>_individual.nc, <observational_dataset>_resampled_individual_1979_2020_03_09.nc - Figure 9: Sigma_mem_SIA.nc, Sigma_LE_SIA.ncmore » « less
An official website of the United States government
