skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: "Cohorts of immature Pteropus bats show interannual variation in Hendra virus serology
{"Abstract":[" LUMINEX Wildlife Disease Analysis Pipeline\n\n  Overview\n\n  Bayesian analysis pipeline for "Cohorts of immature Pteropus bats show interannual variation in Hendra virus serology"\n\nSummary \n\n  Prerequisites\n\n  Software Requirements\n\n  - R (≥ 4.0.0)  - Stan (≥ 2.21)\n\n  Required R Packages\n\n  # Core packages  install.packages(c("rstan", "tidyverse", "here", "loo", "bayesplot"))  # Additional packages    install.packages(c("RColorBrewer", "cowplot", "boot", "see", "factoextra", "bestNormalize", "LaplacesDemon", "ggpubr", "plyr", "see", "pscl"))    # Core utility functions (CRITICAL DEPENDENCY)  source(here("R", "useful_functions.R"))  # Bayesian analysis functions  source(here("R", "paper_theme.R"))      # Plotting themes    Hardware Requirements\n\n  - Storage: ≥5GB free space  - CPU: Multi-core processor recommended\n\n  Data Requirements\n\n  /raw_sharable folder\n\n  Execution Workflow\n\n  ⚠️ CRITICAL: Execute in This Order\n\n  Phase 1: Core Data Processing\n\n  1. Initial Data Cleaning  source("R/create_datasets_for_stan_part_1.R")    - Runtime: ~5-10 minutes    - Outputs: data_for_cohort_model.csv, luminex_igg_igm.csv  2. Serology Classification  rmarkdown::render("R/mixture_model_final.R")    - Runtime: ~30-60 minutes (Stan model fitting)    - Outputs: serology_prob.csv  3. Age/Cohort Assignment  rmarkdown::render("R/cohort_model_2025_05_12.Rmd")    - Runtime: ~1-3 hours (complex Stan models)    - Outputs: age_predictions.csv & a species comparison analysis   4. Dataset Integration  source("R/create_datasets_for_stan_part_2_06_30_24.R")    - Runtime: ~10-15 minutes    - Outputs: All analysis-ready datasets with time-alive variables\n\n  Phase 2: Primary Analyses\n\n  5. Prevalence Smoothing Analysis  rmarkdown::render("R/gaussian_smooth_prevalence_06_30_24.Rmd")    - Runtime: ~10+ hours (multiple Stan models)    - Key outputs: Prevalence curves, model comparisons    - Additional outputs:       -Basic prevalence smoothing (4 pathogens × 4 cohorts)      -Site-specific analysis (8-cohort models)      -Sex-stratified analysis      -Stringent cohort cutoff analysis      -Multiple cutoff threshold testing (5 different cutoffs)      -Date-based modeling      -Batch effect testing      -Adult vs juvenile comparisons  6. Logistic Regression Analysis  source("R/logistic_models_fig_3.R")  source("R/logistic_models_fig_2.R")    - Runtime: ~30-60 minutes    - Outputs: Figure 2 & 3 plots\n\n  Phase 3: Supporting Analyses (Optional)\n\n  7. Additional Analyses (run as needed):    - adult_prevalence_curves.Rmd - Adult dynamics    - PCA_new_analysis.R - Multivariate analysis    - additional_figures.R - Supplementary figures\n\n  Key Functions (useful_functions.R)\n\n  - fit_4cohort_model() - Bayesian 4-cohort prevalence model  - compile_stan_results() - Extract and format Stan results  - create_time_sequence() - Generate prediction timepoints  - compute_loo_cv() - Leave-one-out cross-validation  - plot_parameter_diagnostics() - Model diagnostic plots\n\n  Troubleshooting\n\n  Common Issues\n\n  Stan compilation errors:  # Recompile Stan models  rstan_options(auto_write = TRUE)  options(mc.cores = parallel::detectCores())\n\n  Memory issues:  - Reduce Stan iterations: ITER = 1000 instead of ITER = 2000  - Run analyses sequentially, not in parallel\n\n  Missing dependencies:  # Load all utility functions  source(here("R", "useful_functions.R"))  source(here("R", "paper_theme.R"))\n\n  Resume from Saved Results\n\n  Many scripts save intermediate results:  # Check for existing model fits  if(file.exists("model_results.RData")) {    load("model_results.RData")  } else {    # Run full analysis  }\n\n  Output Structure\n\n  Luminex_figs/r_figs/     # All generated figures  Data_for_publication/    # Final analysis datasets  stan/                    # Stan model files  R/                       # Analysis scripts\n\n  Expected Runtime\n\n  Full pipeline: 10-20 hours on modern hardware  \n\n \n\n "]}  more » « less
Award ID(s):
2133763 1716698
PAR ID:
10674322
Author(s) / Creator(s):
Publisher / Repository:
Zenodo
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Synchronized seasonal excretion of multiple coronaviruses coincides with high rates of coinfection in immature bats. This repo contains instructions and source code for reproducing the statistical analyses in the manuscript. Repo Contents scripts: contains the source .R and .stan files to reproduce the anaysis. Each file is detailed below in the specific sections corresponding to the statistical analyses. data: contains the raw source data and model generated output. figures: contains the final output figures from the manuscript. These can be recreated with the CovOZ_Figures_Submission_Clean.R script. 1. System Requirements Hardware Requirements Our source code requires only a standard computer. Much of the Markov chain Monte Carlo code is run in parallel so a computer with ample memory and multiple cores can be advantageous. The runtimes below are generated using a macbook with the recommended specs (64 GB RAM, 8 cores at 2.7 GHz). The code will also work on linux or windows computer. Software Requirements Reproducing the statistical analyses requires a current version of R and stan. We use version 4.4.1 of R and version 2.32.2 of stan. Package dependencies and versions Users will need the following packages install the following packages to execute the code. Our versions are effective October 1, 2024 tidyverse 2.0.0 lubridate 1.9.3 stringr 1.5.1 rstan 2.32.6 cowplot 1.1.3 ggtext 0.1.2 jpeg 0.1-10 scales 1.3.0 tictoc 1.2.1 2. Installation Guide Running the analysis requires: installing R. Depending on wifi speeds, installing R usually takes a few minutes. installing stan. Depending on wifi speeds, installing stan usually takes a few minutes. installing the necessary R packages (listed above). Depending on wifi speeds, installing packages usually takes about 30 seconds per package. 3. Demo This source code is not an R package with a formal demo, but rather source code is included for the various analyses in section 4. 4. Instructions for Use 4.1 Coinfection Analysis Runs chi-squared tests on coinfections of beta 2d.iv and beta 2d.v. Generates summary statistics, test statistics, and p-values from manuscript. input files: individual_variant_covariates.csv script file: coinfection_final.R run time: approximately 1 second 4.2 Individual Level Dynamics of Infection: Dynamic Binary Regression Runs individual level dynamic binary regression models. Produces output file that can recreate figures. input files: individual_variant_covariates.csv script files: logistic_curves_final.R GP_regression.stan output files: logistic_curve_out.RData run time: approximately 66 minutes 4.3 Dynamics of Circulation at the Population Level Runs combined (individual and pooled data) dynamic models. Produces output file that can recreate figures. input files: combined_out_variant.csv script files: cluster_curves_final.R GP_withLL.stan output files: cluster_curves.csv run time: approximately 25 minutes 4.4 Manuscript Figures Combined script that uses output files created by previous scripts to recreate all figures in the manuscript. input files: model_output/cluster_curves.csv combined_out_variant.csv individual_variant_covariates.csv model_output/logistic_curve_out.RData script files: CovOZ_Figures_Submission_Clean.R output files: Figure2_final.png Figure3_final.png Figure4_A-D_final.png Figure6_AP.png Figure7.png SIFigure8.png SIFigure9.png run time: approximately 16 seconds 4.5 Model Comparison Integrated Compares LOOIC values for sets of model frameworks. input files: combined_out_variant.csv script files: Pred_Comparisons.R GP_withLL.stan output files: preds.RData run time: approximately 2 hours 4.6 Model Comparison Individual Compares LOOIC values for sets of model frameworks. input files: combined_out_variant.csv script files: logistic_curves_loo.R GP_regression.stan GP_regression_add.stan GP_regression_interact.stan output files: logistic_curve_loo_age.RData logistic_curve_loo_age_add_sex.RData logistic_curve_loo_age_interact_sex.RData run time: approximately 6:45 hours 
    more » « less
  2. {"Abstract":["Evolutionary adaptation can allow a population to persist in the face of a\n new environmental challenge. With many populations now threatened by\n environmental change, it is important to understand whether this process\n of evolutionary rescue is feasible under natural conditions, yet work on\n this topic has been largely theoretical. We used unique long-term data to\n parameterize deterministic and stochastic models of the contribution of\n one trait to evolutionary rescue using field estimates for the subalpine\n plant Ipomopsis aggregata and hybrids with its close relative I.\n tenuituba. In the absence of evolution or plasticity, the two studied\n populations are projected to go locally extinct due to earlier snowmelt\n under climate change, which imposes drought conditions. Phenotypic\n selection on specific leaf area (SLA) was estimated in 12 years and\n multiple populations. Those data on selection and its environmental\n sensitivity to annual snowmelt timing in the spring were combined with\n previous data on heritability of the trait, phenotypic plasticity of the\n trait, and the impact of snowmelt timing on mean absolute fitness.\n Selection favored low values of SLA (thicker leaves). The evolutionary\n response to selection on that single trait was insufficient to allow\n evolutionary rescue by itself, but in combination with phenotypic\n plasticity it promoted evolutionary rescue in one of the two populations.\n The number of years until population size would stop declining and begin\n to rise again was heavily dependent upon stochastic environmental changes\n in snowmelt timing around the trend line. Our study illustrates how field\n estimates of quantitative genetic parameters can be used to predict the\n likelihood of evolutionary rescue. Although a complete set of parameter\n estimates are generally unavailable, it may also be possible to predict\n the general likelihood of evolutionary rescue based on published ranges\n for phenotypic selection and heritability and the extent to which early\n snowmelt impacts fitness."],"Methods":["The study sites consisted of three “Poverty Gulch” sites in\n Gunnsion National Forest and one site “Vera Falls” at the Rocky Mountain\n Biological Laboratory, all in Gunnison County, CO, USA. Focal plants\n included two sets of plants. One set (data from 2009-2019) consisted of\n plants in common gardens at three sites: an I. aggregata\n site (hereafter “agg”), an I. tenuituba\n site (hereafter “ten”) and a site at the center of the natural\n hybrid zone (hereafter “hyb”). The second set consisted of plants growing\n in situ at two of the same Poverty Gulch sites (“agg” and “hyb”), and\n an I. aggregata site at Vera Falls (hereafter “VF”;\n data from 2017-2023).  The common gardens were started\n from seed in 2007 and 2008. Measurements of SLA in these gardens began\n when plants were 2 years old, either 2009 or 2010 depending upon the\n garden, as they are only small seedlings during their first summer after\n seed maturation. By 2018, all but 15 of the 4512 plants originally planted\n had died, with or without blooming, and we stopped following these\n gardens. Starting in 2017, in situ vegetative plants at the I.\n aggregata site and the hybrid site whose longest leaf exceeded\n 25 mm were marked with metal tags to facilitate\n identification.   In each year of the study, one leaf\n from each vegetative plant was collected in the field and transported on\n ice to the RMBL, 8 km distant. There each leaf was scanned with a flatbed\n scanner and analyzed using ImageJ to measure leaf area. The leaf was dried\n at 70 deg C for 2 hours and then weighed to obtain dry mass and calculate\n SLA as area/dry mass. For plants in the common gardens, SLA was measured\n on 982 leaves from 383 plants in 2009 – 2014. For in situ plants, SLA was\n measured on one leaf from each of 877 plants in 2017 – 2022. Fitness was\n estimated as the binary variable of survival to flowering. Plants that\n were still alive in 2019 in the common gardens or in 2023 at the end of\n the study were assumed to survive to flowering. These\n data were used to estimate selection differentials on SLA in each of 12\n years. We then combined this information with previous information on\n heritability and the effect of snowmelt date in the spring on mean\n absolute fitness, measured as the finite rate of population increase, from\n a previous demographic study. This information was used to parameterize\n models of evolutionary rescue that we developed. We developed two models\n that differed in how snowmelt timing changed: a Step-change model and a\n Gradual environmental change model and analyzed both deterministic and\n stochastic versions. All analysis and modeling was done in R ver\n 4.2.2.  "],"TechnicalInfo":["# Data for: Predicting the contribution of single trait evolution to\n rescuing a plant population from demographic impacts of climate change\n Dataset DOI: [10.5061/dryad.ht76hdrtn](10.5061/dryad.ht76hdrtn) ##\n Description of the data and file structure File\n "mastervegtraitsSLA2023.csv" contains data on specific leaf area\n for Ipomopsis plants in the field. Files\n "masterdemography_insitu_2023.csv" and\n "masterdemography_commongarden.csv" provide the corresponding\n information on survival to flowering. File "snowmelt.csv"\n provides dates of snowmelt in the spring. File\n "selection_vs_snowmelt.csv" provides intermediate results on\n selection intensities from analysis with the first parts of the code\n "Campbell-EvolutionLettersMay2025.Rmd". File\n "IPMresults.csv" provides estimates of the finite rate of\n increase (lambda) predicted from the publication by Campbell\n [https://doi.org/10.1073/pnas.1820096116](https://doi.org/10.1073/pnas.1820096116) File "Campbell-EvolutionLettersMay2025.Rmd" provides the R code for statistical analysis and the deterministic and stochastic models of evolutionary rescue. All data analysis and modeling was done in R ver. 4.4.2 on a Windows machine. All necessary input data files are provided. The R code is annotated to indicate which portions produce analyses and figures in the manuscript. For the multipart figures 6-9 the code needs to be manually updated to produce each part of the figure before assembling them. In those cases, each part represents a model with a unique set of parameters. ### Files and variables #### File: Data\\_files\\_for\\_EVL\\_Campbell\\_2025.zip **Description:** All data files Blank cells are indicated by "." except in "selection_vs_snowmelt.csv" where they are indicated by "NA" **File:** mastervegtraitsSLA2023.csv * meltday = first day of bare ground at the Rocky Mountain Biological Lab (RMBL) in units of days starting with January 1 * year = year * site = site. agg = site with I. aggregata. hyb = site with natural hybrids. ten = site with I. tenuituba. VF = Vera Falls site containing I. aggregata. * idtag = metal tag used to identify plant * planttype = type of plant. AA = progeny of I. aggregata x I. aggregata. AT = progeny of I. aggregata x I. tenuituba. TA = progeny of I. tenuituba x I. aggregata. TT = progeny of I. tenuituba x I. tenuituba. F2 = progeny of F1 (either AT or TA) x F1. agg = natural I. aggregata. hyb = natural hybrid. * sla = specific leaf area in units of cm2/g * uniqueid = an id used to identify the plant uniquely across all years and sites **File:** masterdemography\\_insitu\\_2023.csv * site = site. agg = site with I. aggregata. hyb = site with hybrids. VF = Vera Falls site containing I. aggregata. * idtag = metal tag used to identify plant * yeartagged = year the plant was first tagged * flrlabelxx = label for plants flowering in year 20xx * stagexxxx = stage in year xxxx. 0 = dead. 1 = single vegetative rosette. 2 = single inflorescence. 3 = multiple vegetative rosette. 4 = multiple inflorescence. * lengthxx = length of longest leaf in year 20xx in mm * leavesxx = number of leaves in rosette(s) in year 20xx **File:** masterdemography_commongarden.csv * site = site. agg = site with I. aggregata. hyb = site with natural hybrids. ten = site with I. tenuituba. * IDTAG = metal tag used to identify plant * Planttype = type of plant. AA = progeny of I. aggregata x I. aggregata. AT = progeny of I. aggregata x I. tenuituba. TA = progeny of I. tenuituba x I. aggregata. TT = progeny of I. tenuituba x I. tenuituba. F2full = full-sib progeny of F1 (either AT or TA) x F1. F2non = non full-sib progeny of F1 x F1. * stagexx = stage of plant in year 20xx. 0 = dead. 1 = single vegetative rosette. 2 = single inflorescence. 3 = multiple vegetative rosette. 4 = multiple inflorescence. * lengthxx = length of longest leaf in year 20xx in mm. * leavesxx = number of leaves in rosette(s) in year 20xx. **File:** snowmelt.csv * Year = year * Snowmelt = day of first bare ground at the RMBL in units of day starting with January 1. Values prior to 1975 were estimated. **File:** selection*vs*snowmelt.csv * meltday = day of first bare ground at the RMBL in units of day starting with January 1. * year = year * Sbyyearwithsite = standardized selection differential on SLA in model that includes site. These values are reproduced with standard errors in Table 1. * bwithsite = regression coefficient for raw survival on raw SLA in model that includes site. * meansurv = mean survival * covwsla = raw selection differential on SLA * bwithsitehyb = regression coefficient for raw survival on SLA at site hyb * meansurvhyb = mean survival at site hyb * covwslahyb = raw selection differential on SLA at site hyb used in the Gradual environmental change model * covwslaagg = raw selection differential on SLA at site agg used in the Gradual environmental change model * meansurvagg = mean survival at site agg * melthyb = estimated date of bare ground at site hyb * meltagg = estimated date of bare ground at site agg **File:** IPMresults.csv * site = site. agg = site with I. aggregata. hyb = site with natural hybrids. * day = predicted day of snowmelt (all predictions are from Campbell, D. R. 2019. Early snowmelt projected to cause population decline in a subalpine plant. PNAS (USA) 116(26) 1290-12906.) Units are days starting with January 1. * lambda = predicted finite rate of increase **File:** Campbell-EvolutionLettersMay2025.Rmd Contains R code for data analysis and modeling. All analysis and modeling was done in R ver 4.2.2."]} 
    more » « less
  3. {"Abstract":["Human disturbance can have profound effects on biodiversity, including\n increasing hybridization between reproductively isolated species. One\n approach for understanding how human activity affects hybridization\n dynamics is to evaluate correlations between disturbance (e.g.,\n urbanization, temperature change) and hybridization. Because variation in\n hybridization can also arise from historical factors unrelated to recent\n human disturbance, it is essential to account for population structure to\n avoid spurious correlations. Here, we combine environmental and\n high-coverage whole-genome resequencing data to investigate how human\n disturbance and population structure affect hybridization dynamics between\n a pair of pine sawflies adapted to different pines, Neodiprion lecontei\n and Neodiprion pinetum. We find that N. lecontei and N. pinetum exhibit\n strikingly different patterns of population structure, which we\n hypothesize stems from differences in host. We also find that recent\n admixture is both asymmetric and geographically variable. Linear\n regression analyses reveal that admixture proportion is predicted by\n indirect human disturbance (i.e., climate change) and not direct human\n disturbance (e.g., urbanization) in both N. lecontei and N. pinetum.\n Lastly, in N. pinetum, we find evidence of a spurious association between\n admixture and direct human disturbance that disappears when regression\n models account for population structure via inclusion of genetic principal\n component scores as covariates. Together, our data suggest that indirect\n human disturbance and population structure both contribute to geographic\n variation in admixture between N. lecontei and N. pinetum. Our study also\n highlights the importance of adequately controlling for population\n structure when attempting to identify environmental predictors (human\n disturbance-related or not) of hybridization."],"TechnicalInfo":["# Recent climate change and historical population structure predict\n spatial patterns of admixture between two host-specialized pine sawfly\n species Dataset DOI: [10.5061/dryad.kh18932jw](10.5061/dryad.kh18932jw) ##\n Description of the data and file structure To perform all analyses, run\n scripts provided in the order indicated in the folder/script name (i.e.,\n "01*" to "07*"). NOTE: for all bash scripts (*.sh),\n you will have to modify them to run on your cluster. However, the program\n commands should work as long as you are using the same version of the\n program (program versions indicated in the Materials and Method section of\n the manuscript). ### Files and variables #### File:\n Glover_Linnen_dryad.zip **Description:**  Input files required to run\n analyses are provided: 1.\n "allsites_LecPineHetr_filtered_ExclIndvs_minQ20_AllImbHom_refilter_bcftools_norm.vcf.gz" - required for ABBA-BABA analysis ("06_ABBA-BABA" folder); vcf file contains variant and invariant sites for all *N. lecontei*, *N. pinetum*, and *N. hetricki* individuals included in analysis 2. "snps_LecPine_filtered_ExclIndvs_MAF05_minQ20_AllImbHom_refilter_pruned_bcftools_norm.vcf.gz" - required for ADMIXTURE ("03_ADMIXTURE" folder) and principal component analysis (PCA) for both species combined ("04_run_pcas.sh" script) analyses; vcf file contains linkage-pruned SNPs for all *N. lecontei* and *N. pinetum* individuals included in analyses (i.e., contaminated/putative haploid male individuals removed) 3. "snps_LecOnly_filtered_ExclIndvs_MAF05_minQ20_AllImbHom_refilter_pruned_bcftools_norm.vcf.gz" - required for PCA for *N. lecontei* only analysis ("04_run_pcas.sh" script"); vcf file contains linkage-pruned SNPs for all *N. lecontei* individuals included in analysis (i.e., contaminated/putative haploid male individuals removed) 4. "snps_PineOnly_noHybrid_filtered_ExclIndvs_MAF05_minQ20_AllImbHom_refilter_pruned_bcftools_norm.vcf.gz" - required for PCA for *N. pinetum *only analysis ("04_run_pcas.sh" script"); vcf file contains linkage-pruned SNPs for all *N. pinetum *individuals included in analysis (i.e., contaminated/putative haploid male individuals and F1 hybrid called *N. pinetum* at time of collection due to being collected on white pine removed) 5. "fulldata_LecPine.csv", "fulldata_LecOnly.csv", "fulldata_PineOnly.csv" - full/compiled data for all *N. lecontei* and *N. pinetum* individuals together, all *N. lecontei* individuals, and all *N. pinetum* individuals included in downstream analyses, respectively. These files are required to run the "05_plot_PCAs_ADMIXTURE.R" and "07_run_models.R" scripts. Description of columns in the *.csv files (#5) above (each row represents an individual sawfly): 1. "ind" = unique individual identifier assigned by the sequencing center (Admera Health) 2. "genPC1_*" = genetic PC1 score from PCA of both species combined ("genPC1_LecPine" in "fulldata_LecPine.csv" file), *N. lecontei* only ("genPC1_LecOnly" in "fulldata_LecOnly.csv" file), or *N. pinetum* only ("genPC1_PineOnly" in "fulldata_PineOnly.csv" file) 3. "genPC2_*" = genetic PC2 score from PCA of both species combined ("genPC2_LecPine" in "fulldata_LecPine.csv" file), *N. lecontei* only ("genPC2_LecOnly" in "fulldata_LecOnly.csv" file), or *N. pinetum* only ("genPC2_PineOnly" in "fulldata_PineOnly.csv" file) 4. "genPC3_*" = genetic PC3 score from PCA of both species combined ("genPC3_LecPine" in "fulldata_LecPine.csv" file), *N. lecontei* only ("genPC3_LecOnly" in "fulldata_LecOnly.csv" file), or *N. pinetum* only ("genPC3_PineOnly" in "fulldata_PineOnly.csv" file) 5. "AG_ID" = unique individual identifier assigned by the Linnen lab 6. "avg_perc_PP_reads_mapped" = percentage of properly paired reads mapped; averaged between the two sequencing batches 7. "MERGED_FINAL_read_count" = total read count across two sequencing batches 8. "species" = species identity 9. "city" = city individual was collected in 10. "state" = state individual was collected in 11. "area" = geographic area ("North" or "Central") that individual was assigned to based on ADMIXTURE ("03_ADMIXTURE" folder) and PCA ("04_run_pcas.sh" script) analyses 12. "latitude" = latitude individual was collected at 13. "longitude" = longitude individual was collected at 14. "host" = pine host individual was collected on 15. "stage" = life stage of individual at time of DNA extraction, either "larva" or adult female ("adultF") 16. "lcPC1" = PC1 score from PCA of land cover proportions (from analyses in "01_LandCover_data" folder) 17. "lcPC2" = PC2 score from PCA of land cover proportions (from analyses in "01_LandCover_data" folder) 18. "change_tmax" = change in average daily maximum temperature from the 1980's to the 2010's in degrees Celsius (from analyses in "02_temperature_data" folder) 19. "site_number" = arbitrary number assigned to sampling site; each unique sampling site (i.e., unique latitude/longitude GPS coordinates) was assigned an arbitrary number (1-189) 20. "grouped_site_number_5km" = grouped site number, where all sampling sites that were within 5km of each other were considered a “grouped site” and were assigned an arbitrary grouped site number 21. "allele_imbalance" = percentage of heterozygote calls displaying allele balance ratios less than 0.3 22. "avg_depth_coverage" = average depth coverage across all linkage-pruned SNPs 23. "missingness" = proportion of sites with a missing genotype 24. "heterozygosity" = proportion of sites with a heterozygous genotype 25. "pop" = population label for making ADMIXTURE plots ("05_plot_PCAs_ADMIXTURE.R" script); [species]_[sampling state] 26. "admix_prop_k2" = admixture proportion for K=2 (i.e., species-level admixture) 27. "lec_blue_k2" = final assignment probability for "blue" genetic group (color assigned by CLUMPAK) that corresponds to *N. lecontei* ancestry across all 100 ADMIXTURE runs for K=2 28. "pine_orange_k2" = final assignment probability for "orange" genetic group (color assigned by CLUMPAK) that corresponds to *N. pinetum* ancestry across all 100 ADMIXTURE runs for K=2 29. "lec_blue_k3" = final assignment probability for "blue" genetic group (color assigned by CLUMPAK) that corresponds to *N. lecontei* ancestry across all 100 ADMIXTURE runs for K=3 30. "pine_orange_k3" = final assignment probability for "orange" genetic group (color assigned by CLUMPAK) that corresponds to *N. pinetum* ancestry across all 100 ADMIXTURE runs for K=3 31. "lec_purple_k3" = final assignment probability for "purple" genetic group (color assigned by CLUMPAK) that corresponds to *N. lecontei* ancestry across all 100 ADMIXTURE runs for K=3 32. "lec_blue_k4" = final assignment probability for "blue" genetic group (color assigned by CLUMPAK) that corresponds to *N. lecontei* ancestry across all 100 ADMIXTURE runs for K=4 33. "pine_orange_k4" = final assignment probability for "orange" genetic group (color assigned by CLUMPAK) that corresponds to *N. pinetum* ancestry across all 100 ADMIXTURE runs for K=4 34. "lec_purple_k4" = final assignment probability for "purple" genetic group (color assigned by CLUMPAK) that corresponds to *N. lecontei* ancestry across all 100 ADMIXTURE runs for K=4 35. "lec_green_k4" = final assignment probability for "green" genetic group (color assigned by CLUMPAK) that corresponds to *N. lecontei* ancestry across all 100 ADMIXTURE runs for K=4 Perform analyses: The first folder ("01_LandCover_data") contains the R script to calculate land cover proportions within a 5km-radius buffer around each site's GPS coordinates ("get_LandCover_props.R"). The required input land cover raster files to run this script are available in the public domain at ([https://www.mrlc.gov/viewer/](https://www.mrlc.gov/viewer/)) for the United States and ([http://www.cec.org/north-american-environmental-atlas/land-cover-30m-2020/](http://www.cec.org/north-american-environmental-atlas/land-cover-30m-2020/)) for Canada. This folder also contains the input files ("for_lcPCA.csv", "latlong.csv") and script ("LandCover_PCA.R") necessary to complete the PCA of land cover proportions for each of the 189 sampling sites. Specifically, the "for_lcPCA.csv" file contains, for each sampling site, the proportions of each land cover class within a 5km radius buffer around the GPS coordinates of the sampling site; the "latlong.csv" file contains the GPS coordinates (latitude and longitude) of each sampling site. The second folder ("02_temperature_data") contains the input file ("LatLong.csv") and script ("get_tmax.R") necessary to extract the average of daily maximum temperature for each month within each year (1980-1989 and 2010-2019) for the latitude/longitude that each individual sawfly was collected at. The required input temperature raster files to run this script are available in the public domain at ([https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=2131](https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=2131)) for the United States and Canada. The third folder ("03_ADMIXTURE") contains the script ("run_admixture.sh") to perform 100 ADMIXTURE runs for each K (1-10). The output file from CLUMPAK for K=2 is also provided ("ClumppIndFile.output"). In this text file, each row represents an individual; the proportion of the individual's ancestry originating from each of the two inferred ancestral populations are provided (last two columns), from which admixture proportions can be extracted. The script to perform the next analysis (PCA) is provided ("04_run_pcas.sh"). Next, the script to plot the results from the ADMIXTURE and PCA analyses above is provided ("05_plot_PCAs_ADMIXTURE.R"). The next folder ("06_ABBA-BABA") contains the scripts (".sh") and input files (".txt" and ".nwk") to perform the ABBA-BABA analysis for the "North" and "Central" geographic regions separately. For each region, the *"*.txt" file indicates which of the four taxa each individual belongs to (i.e., P~1~, P~2~, P~3~, or Outgroup) - individuals that are in the vcf file but excluded from the analysis are indicated with an "xxx". The ".nwk" file is a text file that represents the phylogenetic tree of the four taxa above in the Newick format (i.e., uses parentheses and commas to show the relationships between taxa). Finally, the script to perform the multiple linear regression models and plot the model results is provided ("07_run_models.R"). ## Code/software All scripts to perform analyses are described above. All software and R package versions are described in the Materials and Methods section of the manuscript."]} 
    more » « less
  4. This resource contains source code and select data products behind the following Master's Thesis: Platt, L. (2024). Basins modulate signatures of river salinization (Master's thesis). University of Wisconsin-Madison, Freshwater and Marine Sciences. The source code represents an R-based data processing and modeling pipeline written using the R package "targets". Some of the folders in the source code zipfile are intentionally left empty (except for a hidden file ".placeholder") in order for the code repository to be setup with the required folder structure. To execute this code, download the zip folder, unzip, and open the salt-modeling-data.Rproj file. Then, reference the instructions in the README.md file for installing packages, building the pipeline, and examining the results. Newer versions of this repository may be updated in GitHub at github.com/lindsayplatt/salt-modeling-data. In addition to the source code, this resource contains three data files containing intermediate products of the pipeline. The first two represent data prepared for the random forest modeling. Data download and processing were completed in pipeline phases 1 - 5, and the random forest modeling was completed in phase 6 (see source code).  site_attributes.csv which contains the USGS gage site numbers and their associated basin attributes site_classifications.csv which contains the classification of a site for both episodic signatures ("Episodic" or "Not episodic") and baseflow salinization signatures ("positive", "none", "negative", or NA). Note that an NA in the baseflow classification column means that the site did not meet minimum data requirements for calculating a trend and was not used in the random forest model for baseflow salinization. site_attribute_details.csv contains a table of each attribute shorthand used as column names in site_attributes.csv and their names, units, description, and data source. 
    more » « less
  5. {"Abstract":["''Evolutionary rescue'' is the process by which\n populations experiencing severe environmental change avoid extinction\n through adaptation. Applying theory to natural populations and\n conservation targets requires investigating the effects of several\n life-history traits, including longevity. Theory demonstrates that\n longevity can inhibit rescue through slower phenotypic adaptation when\n selection acts once per lifetime, leaving open questions about\n longevity's effects when individuals face multiple rounds of\n selection. We developed a model integrating evolutionary rescue with\n concepts from life-history theory, particularly the trade-off where\n increasing longevity produces slower population growth rates. Our model\n varies longevity by modifying the balance of survival and reproduction,\n with selection acting on survival allowing for adaptation within cohorts.\n We used this model to study life-history strategies with different\n longevities responding to sudden environmental change. Simulations\n demonstrated that higher longevity resulted in more time at low density\n and increased extinction. With perfect trait heritability, rates of\n adaptation were nearly identical across longevities. But at lower\n heritabilities, repeated selection under longevity decoupled mean\n population phenotypes and genotypes, producing a transient phase of rapid\n phenotypic change. Our results demonstrate that longevity impedes rescue\n by slowing population growth but does not always slow rates of adaptation."],"TechnicalInfo":["# Data and code from: Longevity hinders evolutionary rescue through slower\n growth but not necessarily slower adaptation All simulation code (in\n language R, using packages dplyr, tidyr, mc2, and parallel for code and\n ggplot2 and cowplot for figures) and output (in .csv form) for paper\n "Longevity hinders evolutionary rescue through slower growth but not\n necessarily slower adaptation". Contents include: * Functions for\n running simulations * Scripts to run simulated experiments (including\n outputs) * Scripts for validating simulation outputs (including figures) *\n Scripts for generating figures in manuscripts ## Main simulation analysis\n output files File `sim_results_m1_allsizes_n.csv`: Mean and variance of\n population size for each of 27 parameter combinations over 101 time steps\n for main batch of simulations. One row corresponds to a summary of one\n time step within a parameter combination. This file was generated by\n script `run_sims/model1_popsize_extinctions.R` and used to generate\n Figures 2 and 3 in main text. Fields: * `s.max` - maximum survival value\n ($$\\widehat{s}$$ in paper) * `var.z` - initial population-level phenotypic\n variance ($$\\gamma^2$$ in paper) * `h2` - heritability ($h^2$$ in paper) *\n `t` - time step * `nbar` - mean of population size across simulation\n trials with given parameter combination in time step * `nvar` - variance\n of population size across simulation trials with given parameter\n combination in time step * `psrv` - proportion of surviving populations\n (i.e., with size > 0) within parameter combination at the given time\n step * `nn` - number of trials used to estimate mean and variance File\n `sim_results_m1_disaggregated_n.csv`: Simulation output (recording size\n only) for 20 trials individual trials per parameter combination over 101\n time steps (or time of extinction, whichever occurs first) for each of 27\n parameter combinations in "main" batch of simulations. One row\n corresponds to one time step within one simulation trial. This file was\n generated by script `run_sims/model1_popsize_extinctions.R` and used to\n generate Figure 2 in main text. Fields: * `t` - time step * `n` -\n population size in time step * `trial` - simulation trial number * `s.max`\n - maximum survival value ($$\\widehat{s}$$ in paper) * `var.z` - phenotypic\n variance ($$\\gamma^2$ in paper) * `h2` - heritability ($h^2$$ in paper) File\n `sim_results_m1_phtype.csv`: Mean and variance of each of three phenotypic\n components (breeding value, environmental component of phenotype, and\n phenotype) for 200 simulated population over time for nine parameter\n combinations over 51 time steps. One row corresponds to one time step\n within one simulation trial. This file was generated by script\n `run_sims/model1_phenotypic_components.R` and used to generate Figures 4\n and 6 in main text and Fig. S28 in Supporting Info. Fields: * `t` - time\n step * `n` - population size in time step (*note: not used in analysis*) *\n `bbar` - mean breeding value in population within time step ($$\\bar{b}$$ in\n paper) * `bvar` - variance of breeding values in population within time\n step ($$\\gamma^2_a$$ in paper) * `bvarw` - *not used in analysis* * `ebar` -\n mean environmental component of phenotype in population within time step\n ($$\\bar{e}$$ in paper) * `evar` - variance of environmental component of\n phenotype in population within time step ($$\\gamma^2_e$$ in paper) * `evarw`\n - *not used in analysis* * `zbar` - mean phenotype in population within\n time step ($$\\bar{z}$$ in paper) * `zvar` - variance of phenotypes in\n population within time step ($$\\gamma^2$$ in paper) * `zvarw` - *not used in\n analysis* * `trial` - simulation trial number * `s.max` - maximum survival\n value ($$\\widehat{s}$ in paper) * `h2` - heritability ($h^2$$ in paper) File\n `sim_results_m1_ages.csv`: Mean and variance of each of three phenotypic\n components (breeding value, environmental component of phenotype, and\n phenotype) and size of each cohort (age class) for 200 simulated\n population over time for nine parameter combinations over 51 time steps.\n One row corresponds to one cohort within time step within one simulation\n trial. This file was generated by script `run_sims/model1_age_structure.R`\n and used to generate Figure 5 in main text and Fig. S27 in Supporting\n Info. Fields: * `t` - time step * `age` - age of cohort (denoted $$k$$ in\n paper) * `n` - cohort size in time step * `r` - number of offspring sired\n by cohort (*note: not used in analysis*) * `bbar` - mean breeding value in\n cohort within time step ($$\\bar{b}$$ in paper) (*note: not used in\n analysis*) * `bvar` - variance of breeding values in cohort within time\n step ($$\\gamma^2_{a,k}$$ in paper) (*note: not used in analysis*) * `ebar` -\n mean environmental component of phenotype in cohort within time step\n ($$\\bar{e}_k$$ in paper) (*note: not used in analysis*) * `evar` - variance\n of environmental component of phenotype in cohort within time step\n ($$\\gamma^2_{e,k}$$ in paper) (*note: not used in analysis*) * `zbar` - mean\n phenotype in cohort within timestep ($$\\bar{z}_k$$ in paper) (*note: not\n used in analysis*) * `zvar` - variance of phenotypes in cohort within time\n step ($$\\gamma^2_k$$ in paper) (*note: not used in analysis*) * `trial` -\n simulation trial number * `s.max` - maximum survival value ($$\\widehat{s}$$\n in paper) * `h2` - heritability ($$h^2$$ in paper) ## Additional simulations\n ### Equal-$$lambda$$ simulations These additional simulations were run with\n all life-history strategies having equal maximum per-time step growth\n rates rather than maximum intrinsic fitness equivalent (as in main\n simulations). Files are named and largely structured similarly to the main\n output files described above but with the following exceptions: Files\n `sim_results_m1_allsizes_n_equal-lambda.csv` and\n `sim_results_m1_disaggregated_n_equal-lambda.csv`: This file includes\n summaries of only 500 trials per combination. These were created by script\n `run_sims/equal_lambda/model1_popsize_extinctions_lambda.R` and were used\n to make Figs. S17 and S18 in Supporting Information. Files\n `sim_results_m1_phenotypic_components_equal-lambda.csv` and\n `sim_results_m1_ages_equal-lambda.csv`: These files recorded the\n equilibrium population size (`lstar`, equivalent to $$\\lambda^*$$ in main\n text) of each life-history strategy. The phenotypic component file\n includes the `s.max` used in the treatment, but by mistake the age\n structure file does not. The longevity treatments are assigned using the\n following `R` code: ``` n.all = mutate( n.all, long = cut(lstar, breaks =\n c(0, 1.185, 1.19, Inf), labels = c('low', 'medium',\n 'high')) ) ``` These files were generated, respectively, by\n scripts `run_sims/equal_lambda/model1_phenotypic_components_lambda.R` and\n `run_sims/equal_lambda/model1_age_structure_lambda.R` and were used to\n make Figures S20-22. ### Equal-$$\\gamma^2_0$ simulations These additional\n simulations were run with all life-history strategies having equal\n phenotypic variance in newborn cohorts rather than population-level\n phenotypic variance equivalent (as in main simulations). Files are named\n and largely structured similarly to the main output files described above\n but with the following exceptions: Files\n `sim_results_m1_allsizes_n_equalg20.csv` and\n `sim_results_m1_disaggregated_n_equalg20.csv`: Note that here the variable\n `var.z` corresponds to the phenotypic variance in a newborn cohort (rather\n than at the population level). These files were created by the script\n `run_sims/model1_popsize_extinctions_equal_g20.R` and were used to make\n Figs. S23 and S24 in Supporting Information. File\n `sim_results_m1_phtype_equalg20.csv`: This file was created by the script\n `run_sims/model1_phenotpyic_components_equal_g20.R` and was used to make\n Figures S25 and S26 in Supporting Information. ### Note on variances\n "missing" data (`NA` in fields) All variances reported in these\n files are sample variances rather than population variances. `NA` appears\n in some files in columns reporting variances (`bvar`, `evar`, and `zvar`)\n where the sample size is one (i.e., `n` is equal to `1` for the\n corresponding row in the file). This is because variance reported is\n sample variance, under which conditions variance is undefined and the `R`\n function `var()` function returns `NA`. ## Scripts All code to re-create\n analysis, including but not limited to data files presented here, are\n publicly available on GitHub at the address\n [https://github.com/melbourne-lab/evo_rescue_longevity](https://github.com/melbourne-lab/evo_rescue_longevity). Please contact [scottwatsonnordstrom@gmail.com](mailto:scottwatsonnordstrom@gmail.com) with questions."]} 
    more » « less