skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Synchronized seasonal excretion of multiple coronaviruses coincides with high rates of coinfection in immature bats: Code.
Synchronized seasonal excretion of multiple coronaviruses coincides with high rates of coinfection in immature bats. This repo contains instructions and source code for reproducing the statistical analyses in the manuscript. Repo Contents scripts: contains the source .R and .stan files to reproduce the anaysis. Each file is detailed below in the specific sections corresponding to the statistical analyses. data: contains the raw source data and model generated output. figures: contains the final output figures from the manuscript. These can be recreated with the CovOZ_Figures_Submission_Clean.R script. 1. System Requirements Hardware Requirements Our source code requires only a standard computer. Much of the Markov chain Monte Carlo code is run in parallel so a computer with ample memory and multiple cores can be advantageous. The runtimes below are generated using a macbook with the recommended specs (64 GB RAM, 8 cores at 2.7 GHz). The code will also work on linux or windows computer. Software Requirements Reproducing the statistical analyses requires a current version of R and stan. We use version 4.4.1 of R and version 2.32.2 of stan. Package dependencies and versions Users will need the following packages install the following packages to execute the code. Our versions are effective October 1, 2024 tidyverse 2.0.0 lubridate 1.9.3 stringr 1.5.1 rstan 2.32.6 cowplot 1.1.3 ggtext 0.1.2 jpeg 0.1-10 scales 1.3.0 tictoc 1.2.1 2. Installation Guide Running the analysis requires: installing R. Depending on wifi speeds, installing R usually takes a few minutes. installing stan. Depending on wifi speeds, installing stan usually takes a few minutes. installing the necessary R packages (listed above). Depending on wifi speeds, installing packages usually takes about 30 seconds per package. 3. Demo This source code is not an R package with a formal demo, but rather source code is included for the various analyses in section 4. 4. Instructions for Use 4.1 Coinfection Analysis Runs chi-squared tests on coinfections of beta 2d.iv and beta 2d.v. Generates summary statistics, test statistics, and p-values from manuscript. input files: individual_variant_covariates.csv script file: coinfection_final.R run time: approximately 1 second 4.2 Individual Level Dynamics of Infection: Dynamic Binary Regression Runs individual level dynamic binary regression models. Produces output file that can recreate figures. input files: individual_variant_covariates.csv script files: logistic_curves_final.R GP_regression.stan output files: logistic_curve_out.RData run time: approximately 66 minutes 4.3 Dynamics of Circulation at the Population Level Runs combined (individual and pooled data) dynamic models. Produces output file that can recreate figures. input files: combined_out_variant.csv script files: cluster_curves_final.R GP_withLL.stan output files: cluster_curves.csv run time: approximately 25 minutes 4.4 Manuscript Figures Combined script that uses output files created by previous scripts to recreate all figures in the manuscript. input files: model_output/cluster_curves.csv combined_out_variant.csv individual_variant_covariates.csv model_output/logistic_curve_out.RData script files: CovOZ_Figures_Submission_Clean.R output files: Figure2_final.png Figure3_final.png Figure4_A-D_final.png Figure6_AP.png Figure7.png SIFigure8.png SIFigure9.png run time: approximately 16 seconds 4.5 Model Comparison Integrated Compares LOOIC values for sets of model frameworks. input files: combined_out_variant.csv script files: Pred_Comparisons.R GP_withLL.stan output files: preds.RData run time: approximately 2 hours 4.6 Model Comparison Individual Compares LOOIC values for sets of model frameworks. input files: combined_out_variant.csv script files: logistic_curves_loo.R GP_regression.stan GP_regression_add.stan GP_regression_interact.stan output files: logistic_curve_loo_age.RData logistic_curve_loo_age_add_sex.RData logistic_curve_loo_age_interact_sex.RData run time: approximately 6:45 hours  more » « less
Award ID(s):
2231624 2133763 1716698
PAR ID:
10674325
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Zenodo
Date Published:
Edition / Version:
v1.0.0
Format(s):
Medium: X
Right(s):
GNU General Public License v3.0 only
Sponsoring Org:
National Science Foundation
More Like this
  1. {"Abstract":[" LUMINEX Wildlife Disease Analysis Pipeline\n\n  Overview\n\n  Bayesian analysis pipeline for "Cohorts of immature Pteropus bats show interannual variation in Hendra virus serology"\n\nSummary \n\n  Prerequisites\n\n  Software Requirements\n\n  - R (≥ 4.0.0)  - Stan (≥ 2.21)\n\n  Required R Packages\n\n  # Core packages  install.packages(c("rstan", "tidyverse", "here", "loo", "bayesplot"))  # Additional packages    install.packages(c("RColorBrewer", "cowplot", "boot", "see", "factoextra", "bestNormalize", "LaplacesDemon", "ggpubr", "plyr", "see", "pscl"))    # Core utility functions (CRITICAL DEPENDENCY)  source(here("R", "useful_functions.R"))  # Bayesian analysis functions  source(here("R", "paper_theme.R"))      # Plotting themes    Hardware Requirements\n\n  - Storage: ≥5GB free space  - CPU: Multi-core processor recommended\n\n  Data Requirements\n\n  /raw_sharable folder\n\n  Execution Workflow\n\n  ⚠️ CRITICAL: Execute in This Order\n\n  Phase 1: Core Data Processing\n\n  1. Initial Data Cleaning  source("R/create_datasets_for_stan_part_1.R")    - Runtime: ~5-10 minutes    - Outputs: data_for_cohort_model.csv, luminex_igg_igm.csv  2. Serology Classification  rmarkdown::render("R/mixture_model_final.R")    - Runtime: ~30-60 minutes (Stan model fitting)    - Outputs: serology_prob.csv  3. Age/Cohort Assignment  rmarkdown::render("R/cohort_model_2025_05_12.Rmd")    - Runtime: ~1-3 hours (complex Stan models)    - Outputs: age_predictions.csv & a species comparison analysis   4. Dataset Integration  source("R/create_datasets_for_stan_part_2_06_30_24.R")    - Runtime: ~10-15 minutes    - Outputs: All analysis-ready datasets with time-alive variables\n\n  Phase 2: Primary Analyses\n\n  5. Prevalence Smoothing Analysis  rmarkdown::render("R/gaussian_smooth_prevalence_06_30_24.Rmd")    - Runtime: ~10+ hours (multiple Stan models)    - Key outputs: Prevalence curves, model comparisons    - Additional outputs:       -Basic prevalence smoothing (4 pathogens × 4 cohorts)      -Site-specific analysis (8-cohort models)      -Sex-stratified analysis      -Stringent cohort cutoff analysis      -Multiple cutoff threshold testing (5 different cutoffs)      -Date-based modeling      -Batch effect testing      -Adult vs juvenile comparisons  6. Logistic Regression Analysis  source("R/logistic_models_fig_3.R")  source("R/logistic_models_fig_2.R")    - Runtime: ~30-60 minutes    - Outputs: Figure 2 & 3 plots\n\n  Phase 3: Supporting Analyses (Optional)\n\n  7. Additional Analyses (run as needed):    - adult_prevalence_curves.Rmd - Adult dynamics    - PCA_new_analysis.R - Multivariate analysis    - additional_figures.R - Supplementary figures\n\n  Key Functions (useful_functions.R)\n\n  - fit_4cohort_model() - Bayesian 4-cohort prevalence model  - compile_stan_results() - Extract and format Stan results  - create_time_sequence() - Generate prediction timepoints  - compute_loo_cv() - Leave-one-out cross-validation  - plot_parameter_diagnostics() - Model diagnostic plots\n\n  Troubleshooting\n\n  Common Issues\n\n  Stan compilation errors:  # Recompile Stan models  rstan_options(auto_write = TRUE)  options(mc.cores = parallel::detectCores())\n\n  Memory issues:  - Reduce Stan iterations: ITER = 1000 instead of ITER = 2000  - Run analyses sequentially, not in parallel\n\n  Missing dependencies:  # Load all utility functions  source(here("R", "useful_functions.R"))  source(here("R", "paper_theme.R"))\n\n  Resume from Saved Results\n\n  Many scripts save intermediate results:  # Check for existing model fits  if(file.exists("model_results.RData")) {    load("model_results.RData")  } else {    # Run full analysis  }\n\n  Output Structure\n\n  Luminex_figs/r_figs/     # All generated figures  Data_for_publication/    # Final analysis datasets  stan/                    # Stan model files  R/                       # Analysis scripts\n\n  Expected Runtime\n\n  Full pipeline: 10-20 hours on modern hardware  \n\n \n\n "]} 
    more » « less
  2. {"Abstract":["This package contains data, outputs, equations, and R scripts for analyses for manuscript entitled "Hot droughts in the Amazon: A window to a future hypertropical climate" by J. Chambers et al., in particular it contains statistical models and analyses for the INPA BIONTE tree mortality study. The Models folder contains details for all statistical models in PDF files. The Scripts folder contains the R scripts for Bayesian Hierarchical Models (two text files) and SEMs (one text file) are separate and reasonably annotated. All data associated with these scripts are in the data folder. The Data folder contains two of the three CSV files used for the analyses and are called by the R scripts. Two of them are part of published datasets (`BIONTE_mortality-rates.csv` from Lima et al. 2024, DOI:10.15486/ngt/1898910 and `SPEI.csv` from Pastorello et al. 2023 DOI:10.15486/ngt/1958257) and also provided in this package for convenience (please see the corresponding datasets for usage and citation terms). The third dataset (`BIONTE_gapfilled_wd.csv`) contains sensitive information and can be obtained by contacting the manuscript lead author. The Outputs folder contains the two output files that provide extra information about the analyses. The file `figuresFeb2025d.pdf` contains all the figures from the manuscript - captions are in the manuscript. The file `ChambersMS.pdf` contains primary results from Bayesian statistical models, regression analyses, and validation steps applied to the tree mortality data from the INPA experiments. The document includes visual summaries, model diagnostics, and leave-one-out (LOO) validation results. A breakdown of file contents can be found in the README file that is part of this package."]} 
    more » « less
  3. This dataset is associated with a manuscript on river plumes and idealized coastal corners with first author Michael M. Whitney. The dataset includes source code, compilation files, and routines to generate input files for the Regional Ocean Modeling System (ROMS) runs used in this study. ROMS output files in NetCDF format are generated by executing the compiled ROMS code with the input files. The dataset also includes MATLAB routines and datafiles for the analysis of model results and generation of figures in the manuscript. The following zip files are included: ROMS_v783_Yan_code.zip [ROMS source code branch used in this study] coastalcorner_ROMS_compilation.zip [files to compile ROMS source code and run-specific Fortran-90 built code] coastalcorner_ROMS_input_generate_MATLAB.zip [ROMS ASCII input file and MATLAB routines to generate ROMS NetCDF input files for runs] coastalcorner_MATLAB_output_analysis.zip [MATLAB data files with selected ROMS output fields and custom analysis routines and datafiles in MATLAB formats used in this study] coastalcorner_MATLAB_figures.zip [custom MATLAB routine for manuscript figure generation and MATLAB data files with all data fields included in figures] coastalcorner_tif_figures.zip [TIF image files of each figure in manuscript] 
    more » « less
  4. {"Abstract":["Why rivers confine flow to a single channel (single-thread planform) or\n divide flow into multiple sub-channels (multi-thread planform) forms a\n longstanding fundamental question in river science, which to date remains\n poorly understood. In the associated manuscript, we probe planform origins\n using a novel dataset of 11+ million riverbank migration vectors mapped\n from 36 years of global satellite imagery along 84 river systems. Results\n show single-thread rivers originate from a balance between bank erosion\n and opposing-bar deposition, which maintains an equilibrium width as\n channels migrate. In contrast, multi-thread rivers originate from\n imbalance: bank erosion outpaces opposing-bar deposition, causing\n sub-channels to repeatedly widen and split. This width instability\n challenges equilibrium paradigms in river science, endangers riverside\n communities, and lowers the potential costs of nature-based river\n restoration projects along multi-thread rivers. Here, we provide the data\n and codes that form the foundation of this manuscript. "],"TechnicalInfo":["# Data and code for: River planforms originate from (im)balance between\n bank erosion and bar accretion --- author: Austin Chadwick contact:\n [achadwick@ldeo.columbia.edu](mailto:achadwick@ldeo.columbia.edu),\n [austin.chadwick23@gmail.com](mailto:austin.chadwick23@gmail.com) These\n materials are organized into two folders, Codes and Data. The Data folder\n contains spreadsheets, GIS geopackages, remote sensing images, and MATLAB\n data files for the associated manuscript "River planforms originate\n from (im)balance between bank erosion and bar accretion". The Codes\n folder provides MATLAB scripts and functions to generate the analysis and\n figures of the main manuscript. The last folder, temp, is empty; it acts\n as a temporary destination for output files that can be generated by the\n codes. ## Description of the data and file structure All data are found in\n the folder "Data", which can be accessted by opening the zip\n file "Data.zip". The following is a breakdown of each file and\n subfolder in the Data folder. * DataLog_082224_1.xlsx This excel document\n records the files used to apply PIV to remote sensing imagery, organized\n on a site-by-site basis. All files referenced therein are found in the\n Data folder. See the first sheet therein "Readme" for details. *\n Polygons subfolder This subfolder contains geopackage files (.gpkg) for\n each study reach in the main manuscript. Each file contains geopackage\n data for a single polygon that outlines and defines the study reach. These\n data were used to extract GEOTIFFs using google earth engine, and were\n obtained by manual selection in QGIS. See main manuscript for details on\n the selection process and sizing of each polygon. * Geotiffs subfolder\n This folder contains DSWE-derived river-mask GEOTIFFs derived from Landsat\n median annual composites, used for the analysis and figures of the main\n manuscript "River planforms originate from (im)balance between bank\n erosion and bar accretion". The DSWE-derived river masks were created\n in Google Earth Engine using the "GEE_watermasks" code found on\n the GitHub Repository\n ([https://github.com/evan-greenbrg/GEE_watermasks](https://github.com/evan-greenbrg/GEE_watermasks)). Each subfolder corresponds to a study reach in the main manuscript (see Table 1 in main manuscript), with nested subfolders containing river masks derived using DSWE confidence levels 1 through 4 (mask1, mask2, mask3, mask4). * PreparedImagery subfolder This folder contains true-color images and DSWE-derived river masks (.tif) for each study reach. These data were used as input data for PIV. For each site, there is a subfolder containing truecolor images ("Color") and sixteen subfolders containing DSWE-derived river masks for the four different confidence levels and four different tilt angles ("MaskX_TiltYY", where X is the confidence level and YY is the tilt angle). In each of these subfolders, there are two nested subfolders "RemovedBlanks" and "RemovedForSubsampling" which contain additional .tif's that were removed because they were blank or because optimal PIV analysis required subsampling, respectively. For details of this process, see: [https://doi.org/10.1029/2023JF007177](https://doi.org/10.1029/2023JF007177). * OutputFromPIVlab subfolder This folder contains the Raw PIV data output from PIVlab software for each site. Each site-specific subfolder contains sixteen .mat files, each of which corresponds to a specific confidence level and tilt angle in the prepared imagery (see DataLog_082224_1.xlsx for details on which file corresponds to which confidence level and tilt angle). For additional instructions on how to operate PIVlab , please see the following links: [http://sead-published.ncsa.illinois.edu/seadrepository/api/researchobjects/urn:uuid:6154f24ae4b0312e761fb761](http://sead-published.ncsa.illinois.edu/seadrepository/api/researchobjects/urn:uuid:6154f24ae4b0312e761fb761) [https://www.mathworks.com/matlabcentral/fileexchange/27659-pivlab-particle-image-velocimetry-piv-tool-with-gui](https://www.mathworks.com/matlabcentral/fileexchange/27659-pivlab-particle-image-velocimetry-piv-tool-with-gui) * PostprocessedPIV subfolder This folder contains the postprocessed PIV data output. Each site-specific subfolder contains four .mat files. These files contain the postprocessed data before filtering (Unfiltered.mat), the postprocessed data after filtering ("Filtered.mat"), the postprocessed data after filtering and setting all NaN values along high-confidence banks to zero ("Filtered_NanToZeroOnBank"). * BankNormalVectors subfolder This folder contains the bank-normal vector fields for each image. The data structure is identical to the PostprocssedPIV data, with a single .mat file for each river "BankNormalVectors.mat". All vectors have a length of 1, and are oriented perpendicular to the nearest channel bank, pointing away from the wetted area and towards the dry area. For details on this calculation, see: [https://doi.org/10.1029/2021WR031236](https://doi.org/10.1029/2021WR031236). * CrossSections subfolder This folder contains the data for the thread cross sections randomly sampled in each reach to calculate the differential migration rate (delv). Each site-specific subfolder contains a single mat file, "CrossSections.mat". This mat contains the positions (x,y) and the bank migration vectors (u,v) for each cross section. Spatial units are in pixels, and temporal units are in frames, such that velocities are in units of pixels/frame. * WettedAreas subfolder This folder contains the data for the wetted channel areas, used to investigate apparent changes in water discharge and plot supplemental figures. Each site-specific subfolder contains a single mat file, "WettedAreas.mat". This mat contains the positions (x,y) and the bank migration vectors (u,v) for each cross section. ## Sharing/Access information Some data referenced in the Datalog_082224.xlsx spreadsheet was derived from previous work: * Sylvester et al., 2019: [https://doi.org/10.1130/G45608.1](https://doi.org/10.1130/G45608.1) * Rowland et al., 2019: [https://doi.org/10.15485/1571527](https://doi.org/10.15485/1571527) * Galeazzi et al., 2021: [https://doi.org/10.1130/G49121.1](https://doi.org/10.1130/G49121.1) * Ielpi et al., 2023: [https://doi.org/10.1029/2022GL101285](https://doi.org/10.1029/2022GL101285) ## Code/Software All codes are found in the folder "Codes", which can be accessed by opening the zip folder "Codes.zip". To run any codes, it is necessary to add the "Codes" and "Data" folders (and their subfolders) to your current MATLAB path. This can be easily achieved by right-clicking the folder in the MATLAB GUI and selecting "Add To Path-->SelectedFolders and Subfolders", and will ensure that all input and output files are identified properly. The following is a breakdown of each file and subfolder in the Codes folder: * AnalsysisAndFigures1_Worldmap_082924_1.m This MATLAB script reproduces Figure 1A of the manuscript and its associated analysis. * AnalysisAndFigures2_PIVmethods_082924_1.m This MATLAB script reproduces Figure 1E of the manuscript and its associated analysis. * AnalysisAndFigures3_DifferentialMigration_082924_1.m This MATLAB script reproduces Figure 2A and Figure 4 of the manuscript, Text S1 Figures 1–4 of the supplement, and their associated analysis. It also reproduces the tabular data in Data S1, performs the statistical tests on delv* reported in the text, and counts the number of vector measurements in our database as reported in the text. * AnalysisAndFigures4_TimeSlices_082924_1.m This MATLAB script reproduces Figure 2B-G and Figure 4 of the manuscript and Figures S1–S3 of the supplement and their associated analysis. In the first code block (L23–36), the user designates the reach and cross section to plot by uncommenting the corresponding line. See code's comments on L23–36 for details. * AnalysisAndFigures5_MaterialsAndMethods_082924_1 This MATLAB script reproduces the supplement's Materials & Methods Figures 1–6 and their associated analysis. In the first code block (L19-23), the user designates the reach to plot by uncommenting the corresponding line. See code's comments on L19-23 for details. * AnalysisAndFigures6_SuppMovieFrames_082924_1.m This MATLAB script reproduces the frames of supplementary Movies S1–S3. In the first code block (L18-23), the user designates the reach to plot by uncommenting the corresponding line. See code's comments on L18-23 for details. * LatLonBySite_081924_1.xlsx This excel document records the approximate latitude and longitude coordinates of each studied reach in the main manuscript. These coordinates were used to generate Fig. 1 in the main manuscript, and were obtained by manually inspecting the coordinates of each .gpkg file in QGIS. The precise geographic coordinates for each reach can be found in the .gpkg files, in the Polygons subfolder described above. ### Change Log Version 2: minor changes in README."]} 
    more » « less
  5. This resource contains source code and select data products behind the following Master's Thesis: Platt, L. (2024). Basins modulate signatures of river salinization (Master's thesis). University of Wisconsin-Madison, Freshwater and Marine Sciences. The source code represents an R-based data processing and modeling pipeline written using the R package "targets". Some of the folders in the source code zipfile are intentionally left empty (except for a hidden file ".placeholder") in order for the code repository to be setup with the required folder structure. To execute this code, download the zip folder, unzip, and open the salt-modeling-data.Rproj file. Then, reference the instructions in the README.md file for installing packages, building the pipeline, and examining the results. Newer versions of this repository may be updated in GitHub at github.com/lindsayplatt/salt-modeling-data. In addition to the source code, this resource contains three data files containing intermediate products of the pipeline. The first two represent data prepared for the random forest modeling. Data download and processing were completed in pipeline phases 1 - 5, and the random forest modeling was completed in phase 6 (see source code).  site_attributes.csv which contains the USGS gage site numbers and their associated basin attributes site_classifications.csv which contains the classification of a site for both episodic signatures ("Episodic" or "Not episodic") and baseflow salinization signatures ("positive", "none", "negative", or NA). Note that an NA in the baseflow classification column means that the site did not meet minimum data requirements for calculating a trend and was not used in the random forest model for baseflow salinization. site_attribute_details.csv contains a table of each attribute shorthand used as column names in site_attributes.csv and their names, units, description, and data source. 
    more » « less