skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Recent climate change and historical population structure predict spatial patterns of admixture between two host-specialized pine sawfly species
{"Abstract":["Human disturbance can have profound effects on biodiversity, including\n increasing hybridization between reproductively isolated species. One\n approach for understanding how human activity affects hybridization\n dynamics is to evaluate correlations between disturbance (e.g.,\n urbanization, temperature change) and hybridization. Because variation in\n hybridization can also arise from historical factors unrelated to recent\n human disturbance, it is essential to account for population structure to\n avoid spurious correlations. Here, we combine environmental and\n high-coverage whole-genome resequencing data to investigate how human\n disturbance and population structure affect hybridization dynamics between\n a pair of pine sawflies adapted to different pines, Neodiprion lecontei\n and Neodiprion pinetum. We find that N. lecontei and N. pinetum exhibit\n strikingly different patterns of population structure, which we\n hypothesize stems from differences in host. We also find that recent\n admixture is both asymmetric and geographically variable. Linear\n regression analyses reveal that admixture proportion is predicted by\n indirect human disturbance (i.e., climate change) and not direct human\n disturbance (e.g., urbanization) in both N. lecontei and N. pinetum.\n Lastly, in N. pinetum, we find evidence of a spurious association between\n admixture and direct human disturbance that disappears when regression\n models account for population structure via inclusion of genetic principal\n component scores as covariates. Together, our data suggest that indirect\n human disturbance and population structure both contribute to geographic\n variation in admixture between N. lecontei and N. pinetum. Our study also\n highlights the importance of adequately controlling for population\n structure when attempting to identify environmental predictors (human\n disturbance-related or not) of hybridization."],"TechnicalInfo":["# Recent climate change and historical population structure predict\n spatial patterns of admixture between two host-specialized pine sawfly\n species Dataset DOI: [10.5061/dryad.kh18932jw](10.5061/dryad.kh18932jw) ##\n Description of the data and file structure To perform all analyses, run\n scripts provided in the order indicated in the folder/script name (i.e.,\n "01*" to "07*"). NOTE: for all bash scripts (*.sh),\n you will have to modify them to run on your cluster. However, the program\n commands should work as long as you are using the same version of the\n program (program versions indicated in the Materials and Method section of\n the manuscript). ### Files and variables #### File:\n Glover_Linnen_dryad.zip **Description:**  Input files required to run\n analyses are provided: 1.\n "allsites_LecPineHetr_filtered_ExclIndvs_minQ20_AllImbHom_refilter_bcftools_norm.vcf.gz" - required for ABBA-BABA analysis ("06_ABBA-BABA" folder); vcf file contains variant and invariant sites for all *N. lecontei*, *N. pinetum*, and *N. hetricki* individuals included in analysis 2. "snps_LecPine_filtered_ExclIndvs_MAF05_minQ20_AllImbHom_refilter_pruned_bcftools_norm.vcf.gz" - required for ADMIXTURE ("03_ADMIXTURE" folder) and principal component analysis (PCA) for both species combined ("04_run_pcas.sh" script) analyses; vcf file contains linkage-pruned SNPs for all *N. lecontei* and *N. pinetum* individuals included in analyses (i.e., contaminated/putative haploid male individuals removed) 3. "snps_LecOnly_filtered_ExclIndvs_MAF05_minQ20_AllImbHom_refilter_pruned_bcftools_norm.vcf.gz" - required for PCA for *N. lecontei* only analysis ("04_run_pcas.sh" script"); vcf file contains linkage-pruned SNPs for all *N. lecontei* individuals included in analysis (i.e., contaminated/putative haploid male individuals removed) 4. "snps_PineOnly_noHybrid_filtered_ExclIndvs_MAF05_minQ20_AllImbHom_refilter_pruned_bcftools_norm.vcf.gz" - required for PCA for *N. pinetum *only analysis ("04_run_pcas.sh" script"); vcf file contains linkage-pruned SNPs for all *N. pinetum *individuals included in analysis (i.e., contaminated/putative haploid male individuals and F1 hybrid called *N. pinetum* at time of collection due to being collected on white pine removed) 5. "fulldata_LecPine.csv", "fulldata_LecOnly.csv", "fulldata_PineOnly.csv" - full/compiled data for all *N. lecontei* and *N. pinetum* individuals together, all *N. lecontei* individuals, and all *N. pinetum* individuals included in downstream analyses, respectively. These files are required to run the "05_plot_PCAs_ADMIXTURE.R" and "07_run_models.R" scripts. Description of columns in the *.csv files (#5) above (each row represents an individual sawfly): 1. "ind" = unique individual identifier assigned by the sequencing center (Admera Health) 2. "genPC1_*" = genetic PC1 score from PCA of both species combined ("genPC1_LecPine" in "fulldata_LecPine.csv" file), *N. lecontei* only ("genPC1_LecOnly" in "fulldata_LecOnly.csv" file), or *N. pinetum* only ("genPC1_PineOnly" in "fulldata_PineOnly.csv" file) 3. "genPC2_*" = genetic PC2 score from PCA of both species combined ("genPC2_LecPine" in "fulldata_LecPine.csv" file), *N. lecontei* only ("genPC2_LecOnly" in "fulldata_LecOnly.csv" file), or *N. pinetum* only ("genPC2_PineOnly" in "fulldata_PineOnly.csv" file) 4. "genPC3_*" = genetic PC3 score from PCA of both species combined ("genPC3_LecPine" in "fulldata_LecPine.csv" file), *N. lecontei* only ("genPC3_LecOnly" in "fulldata_LecOnly.csv" file), or *N. pinetum* only ("genPC3_PineOnly" in "fulldata_PineOnly.csv" file) 5. "AG_ID" = unique individual identifier assigned by the Linnen lab 6. "avg_perc_PP_reads_mapped" = percentage of properly paired reads mapped; averaged between the two sequencing batches 7. "MERGED_FINAL_read_count" = total read count across two sequencing batches 8. "species" = species identity 9. "city" = city individual was collected in 10. "state" = state individual was collected in 11. "area" = geographic area ("North" or "Central") that individual was assigned to based on ADMIXTURE ("03_ADMIXTURE" folder) and PCA ("04_run_pcas.sh" script) analyses 12. "latitude" = latitude individual was collected at 13. "longitude" = longitude individual was collected at 14. "host" = pine host individual was collected on 15. "stage" = life stage of individual at time of DNA extraction, either "larva" or adult female ("adultF") 16. "lcPC1" = PC1 score from PCA of land cover proportions (from analyses in "01_LandCover_data" folder) 17. "lcPC2" = PC2 score from PCA of land cover proportions (from analyses in "01_LandCover_data" folder) 18. "change_tmax" = change in average daily maximum temperature from the 1980's to the 2010's in degrees Celsius (from analyses in "02_temperature_data" folder) 19. "site_number" = arbitrary number assigned to sampling site; each unique sampling site (i.e., unique latitude/longitude GPS coordinates) was assigned an arbitrary number (1-189) 20. "grouped_site_number_5km" = grouped site number, where all sampling sites that were within 5km of each other were considered a “grouped site” and were assigned an arbitrary grouped site number 21. "allele_imbalance" = percentage of heterozygote calls displaying allele balance ratios less than 0.3 22. "avg_depth_coverage" = average depth coverage across all linkage-pruned SNPs 23. "missingness" = proportion of sites with a missing genotype 24. "heterozygosity" = proportion of sites with a heterozygous genotype 25. "pop" = population label for making ADMIXTURE plots ("05_plot_PCAs_ADMIXTURE.R" script); [species]_[sampling state] 26. "admix_prop_k2" = admixture proportion for K=2 (i.e., species-level admixture) 27. "lec_blue_k2" = final assignment probability for "blue" genetic group (color assigned by CLUMPAK) that corresponds to *N. lecontei* ancestry across all 100 ADMIXTURE runs for K=2 28. "pine_orange_k2" = final assignment probability for "orange" genetic group (color assigned by CLUMPAK) that corresponds to *N. pinetum* ancestry across all 100 ADMIXTURE runs for K=2 29. "lec_blue_k3" = final assignment probability for "blue" genetic group (color assigned by CLUMPAK) that corresponds to *N. lecontei* ancestry across all 100 ADMIXTURE runs for K=3 30. "pine_orange_k3" = final assignment probability for "orange" genetic group (color assigned by CLUMPAK) that corresponds to *N. pinetum* ancestry across all 100 ADMIXTURE runs for K=3 31. "lec_purple_k3" = final assignment probability for "purple" genetic group (color assigned by CLUMPAK) that corresponds to *N. lecontei* ancestry across all 100 ADMIXTURE runs for K=3 32. "lec_blue_k4" = final assignment probability for "blue" genetic group (color assigned by CLUMPAK) that corresponds to *N. lecontei* ancestry across all 100 ADMIXTURE runs for K=4 33. "pine_orange_k4" = final assignment probability for "orange" genetic group (color assigned by CLUMPAK) that corresponds to *N. pinetum* ancestry across all 100 ADMIXTURE runs for K=4 34. "lec_purple_k4" = final assignment probability for "purple" genetic group (color assigned by CLUMPAK) that corresponds to *N. lecontei* ancestry across all 100 ADMIXTURE runs for K=4 35. "lec_green_k4" = final assignment probability for "green" genetic group (color assigned by CLUMPAK) that corresponds to *N. lecontei* ancestry across all 100 ADMIXTURE runs for K=4 Perform analyses: The first folder ("01_LandCover_data") contains the R script to calculate land cover proportions within a 5km-radius buffer around each site's GPS coordinates ("get_LandCover_props.R"). The required input land cover raster files to run this script are available in the public domain at ([https://www.mrlc.gov/viewer/](https://www.mrlc.gov/viewer/)) for the United States and ([http://www.cec.org/north-american-environmental-atlas/land-cover-30m-2020/](http://www.cec.org/north-american-environmental-atlas/land-cover-30m-2020/)) for Canada. This folder also contains the input files ("for_lcPCA.csv", "latlong.csv") and script ("LandCover_PCA.R") necessary to complete the PCA of land cover proportions for each of the 189 sampling sites. Specifically, the "for_lcPCA.csv" file contains, for each sampling site, the proportions of each land cover class within a 5km radius buffer around the GPS coordinates of the sampling site; the "latlong.csv" file contains the GPS coordinates (latitude and longitude) of each sampling site. The second folder ("02_temperature_data") contains the input file ("LatLong.csv") and script ("get_tmax.R") necessary to extract the average of daily maximum temperature for each month within each year (1980-1989 and 2010-2019) for the latitude/longitude that each individual sawfly was collected at. The required input temperature raster files to run this script are available in the public domain at ([https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=2131](https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=2131)) for the United States and Canada. The third folder ("03_ADMIXTURE") contains the script ("run_admixture.sh") to perform 100 ADMIXTURE runs for each K (1-10). The output file from CLUMPAK for K=2 is also provided ("ClumppIndFile.output"). In this text file, each row represents an individual; the proportion of the individual's ancestry originating from each of the two inferred ancestral populations are provided (last two columns), from which admixture proportions can be extracted. The script to perform the next analysis (PCA) is provided ("04_run_pcas.sh"). Next, the script to plot the results from the ADMIXTURE and PCA analyses above is provided ("05_plot_PCAs_ADMIXTURE.R"). The next folder ("06_ABBA-BABA") contains the scripts (".sh") and input files (".txt" and ".nwk") to perform the ABBA-BABA analysis for the "North" and "Central" geographic regions separately. For each region, the *"*.txt" file indicates which of the four taxa each individual belongs to (i.e., P~1~, P~2~, P~3~, or Outgroup) - individuals that are in the vcf file but excluded from the analysis are indicated with an "xxx". The ".nwk" file is a text file that represents the phylogenetic tree of the four taxa above in the Newick format (i.e., uses parentheses and commas to show the relationships between taxa). Finally, the script to perform the multiple linear regression models and plot the model results is provided ("07_run_models.R"). ## Code/software All scripts to perform analyses are described above. All software and R package versions are described in the Materials and Methods section of the manuscript."]}  more » « less
Award ID(s):
1750946 2348574
PAR ID:
10667648
Author(s) / Creator(s):
;
Publisher / Repository:
Dryad
Date Published:
Edition / Version:
3
Subject(s) / Keyword(s):
FOS: Biological sciences FOS: Biological sciences Hybridization human disturbance Neodiprion admixture Population structure Climate change
Format(s):
Medium: X Size: 42622951580 bytes
Size(s):
42622951580 bytes
Sponsoring Org:
National Science Foundation
More Like this
  1. {"Abstract":["Why rivers confine flow to a single channel (single-thread planform) or\n divide flow into multiple sub-channels (multi-thread planform) forms a\n longstanding fundamental question in river science, which to date remains\n poorly understood. In the associated manuscript, we probe planform origins\n using a novel dataset of 11+ million riverbank migration vectors mapped\n from 36 years of global satellite imagery along 84 river systems. Results\n show single-thread rivers originate from a balance between bank erosion\n and opposing-bar deposition, which maintains an equilibrium width as\n channels migrate. In contrast, multi-thread rivers originate from\n imbalance: bank erosion outpaces opposing-bar deposition, causing\n sub-channels to repeatedly widen and split. This width instability\n challenges equilibrium paradigms in river science, endangers riverside\n communities, and lowers the potential costs of nature-based river\n restoration projects along multi-thread rivers. Here, we provide the data\n and codes that form the foundation of this manuscript. "],"TechnicalInfo":["# Data and code for: River planforms originate from (im)balance between\n bank erosion and bar accretion --- author: Austin Chadwick contact:\n [achadwick@ldeo.columbia.edu](mailto:achadwick@ldeo.columbia.edu),\n [austin.chadwick23@gmail.com](mailto:austin.chadwick23@gmail.com) These\n materials are organized into two folders, Codes and Data. The Data folder\n contains spreadsheets, GIS geopackages, remote sensing images, and MATLAB\n data files for the associated manuscript "River planforms originate\n from (im)balance between bank erosion and bar accretion". The Codes\n folder provides MATLAB scripts and functions to generate the analysis and\n figures of the main manuscript. The last folder, temp, is empty; it acts\n as a temporary destination for output files that can be generated by the\n codes. ## Description of the data and file structure All data are found in\n the folder "Data", which can be accessted by opening the zip\n file "Data.zip". The following is a breakdown of each file and\n subfolder in the Data folder. * DataLog_082224_1.xlsx This excel document\n records the files used to apply PIV to remote sensing imagery, organized\n on a site-by-site basis. All files referenced therein are found in the\n Data folder. See the first sheet therein "Readme" for details. *\n Polygons subfolder This subfolder contains geopackage files (.gpkg) for\n each study reach in the main manuscript. Each file contains geopackage\n data for a single polygon that outlines and defines the study reach. These\n data were used to extract GEOTIFFs using google earth engine, and were\n obtained by manual selection in QGIS. See main manuscript for details on\n the selection process and sizing of each polygon. * Geotiffs subfolder\n This folder contains DSWE-derived river-mask GEOTIFFs derived from Landsat\n median annual composites, used for the analysis and figures of the main\n manuscript "River planforms originate from (im)balance between bank\n erosion and bar accretion". The DSWE-derived river masks were created\n in Google Earth Engine using the "GEE_watermasks" code found on\n the GitHub Repository\n ([https://github.com/evan-greenbrg/GEE_watermasks](https://github.com/evan-greenbrg/GEE_watermasks)). Each subfolder corresponds to a study reach in the main manuscript (see Table 1 in main manuscript), with nested subfolders containing river masks derived using DSWE confidence levels 1 through 4 (mask1, mask2, mask3, mask4). * PreparedImagery subfolder This folder contains true-color images and DSWE-derived river masks (.tif) for each study reach. These data were used as input data for PIV. For each site, there is a subfolder containing truecolor images ("Color") and sixteen subfolders containing DSWE-derived river masks for the four different confidence levels and four different tilt angles ("MaskX_TiltYY", where X is the confidence level and YY is the tilt angle). In each of these subfolders, there are two nested subfolders "RemovedBlanks" and "RemovedForSubsampling" which contain additional .tif's that were removed because they were blank or because optimal PIV analysis required subsampling, respectively. For details of this process, see: [https://doi.org/10.1029/2023JF007177](https://doi.org/10.1029/2023JF007177). * OutputFromPIVlab subfolder This folder contains the Raw PIV data output from PIVlab software for each site. Each site-specific subfolder contains sixteen .mat files, each of which corresponds to a specific confidence level and tilt angle in the prepared imagery (see DataLog_082224_1.xlsx for details on which file corresponds to which confidence level and tilt angle). For additional instructions on how to operate PIVlab , please see the following links: [http://sead-published.ncsa.illinois.edu/seadrepository/api/researchobjects/urn:uuid:6154f24ae4b0312e761fb761](http://sead-published.ncsa.illinois.edu/seadrepository/api/researchobjects/urn:uuid:6154f24ae4b0312e761fb761) [https://www.mathworks.com/matlabcentral/fileexchange/27659-pivlab-particle-image-velocimetry-piv-tool-with-gui](https://www.mathworks.com/matlabcentral/fileexchange/27659-pivlab-particle-image-velocimetry-piv-tool-with-gui) * PostprocessedPIV subfolder This folder contains the postprocessed PIV data output. Each site-specific subfolder contains four .mat files. These files contain the postprocessed data before filtering (Unfiltered.mat), the postprocessed data after filtering ("Filtered.mat"), the postprocessed data after filtering and setting all NaN values along high-confidence banks to zero ("Filtered_NanToZeroOnBank"). * BankNormalVectors subfolder This folder contains the bank-normal vector fields for each image. The data structure is identical to the PostprocssedPIV data, with a single .mat file for each river "BankNormalVectors.mat". All vectors have a length of 1, and are oriented perpendicular to the nearest channel bank, pointing away from the wetted area and towards the dry area. For details on this calculation, see: [https://doi.org/10.1029/2021WR031236](https://doi.org/10.1029/2021WR031236). * CrossSections subfolder This folder contains the data for the thread cross sections randomly sampled in each reach to calculate the differential migration rate (delv). Each site-specific subfolder contains a single mat file, "CrossSections.mat". This mat contains the positions (x,y) and the bank migration vectors (u,v) for each cross section. Spatial units are in pixels, and temporal units are in frames, such that velocities are in units of pixels/frame. * WettedAreas subfolder This folder contains the data for the wetted channel areas, used to investigate apparent changes in water discharge and plot supplemental figures. Each site-specific subfolder contains a single mat file, "WettedAreas.mat". This mat contains the positions (x,y) and the bank migration vectors (u,v) for each cross section. ## Sharing/Access information Some data referenced in the Datalog_082224.xlsx spreadsheet was derived from previous work: * Sylvester et al., 2019: [https://doi.org/10.1130/G45608.1](https://doi.org/10.1130/G45608.1) * Rowland et al., 2019: [https://doi.org/10.15485/1571527](https://doi.org/10.15485/1571527) * Galeazzi et al., 2021: [https://doi.org/10.1130/G49121.1](https://doi.org/10.1130/G49121.1) * Ielpi et al., 2023: [https://doi.org/10.1029/2022GL101285](https://doi.org/10.1029/2022GL101285) ## Code/Software All codes are found in the folder "Codes", which can be accessed by opening the zip folder "Codes.zip". To run any codes, it is necessary to add the "Codes" and "Data" folders (and their subfolders) to your current MATLAB path. This can be easily achieved by right-clicking the folder in the MATLAB GUI and selecting "Add To Path-->SelectedFolders and Subfolders", and will ensure that all input and output files are identified properly. The following is a breakdown of each file and subfolder in the Codes folder: * AnalsysisAndFigures1_Worldmap_082924_1.m This MATLAB script reproduces Figure 1A of the manuscript and its associated analysis. * AnalysisAndFigures2_PIVmethods_082924_1.m This MATLAB script reproduces Figure 1E of the manuscript and its associated analysis. * AnalysisAndFigures3_DifferentialMigration_082924_1.m This MATLAB script reproduces Figure 2A and Figure 4 of the manuscript, Text S1 Figures 1–4 of the supplement, and their associated analysis. It also reproduces the tabular data in Data S1, performs the statistical tests on delv* reported in the text, and counts the number of vector measurements in our database as reported in the text. * AnalysisAndFigures4_TimeSlices_082924_1.m This MATLAB script reproduces Figure 2B-G and Figure 4 of the manuscript and Figures S1–S3 of the supplement and their associated analysis. In the first code block (L23–36), the user designates the reach and cross section to plot by uncommenting the corresponding line. See code's comments on L23–36 for details. * AnalysisAndFigures5_MaterialsAndMethods_082924_1 This MATLAB script reproduces the supplement's Materials & Methods Figures 1–6 and their associated analysis. In the first code block (L19-23), the user designates the reach to plot by uncommenting the corresponding line. See code's comments on L19-23 for details. * AnalysisAndFigures6_SuppMovieFrames_082924_1.m This MATLAB script reproduces the frames of supplementary Movies S1–S3. In the first code block (L18-23), the user designates the reach to plot by uncommenting the corresponding line. See code's comments on L18-23 for details. * LatLonBySite_081924_1.xlsx This excel document records the approximate latitude and longitude coordinates of each studied reach in the main manuscript. These coordinates were used to generate Fig. 1 in the main manuscript, and were obtained by manually inspecting the coordinates of each .gpkg file in QGIS. The precise geographic coordinates for each reach can be found in the .gpkg files, in the Polygons subfolder described above. ### Change Log Version 2: minor changes in README."]} 
    more » « less
  2. {"Abstract":["As advances in semiconductor technology drive higher power densities,\n thermal management is becoming increasingly crucial to maintaining the\n performance and reliability of devices. In recent decades, computational\n fluid dynamics (CFD) has been widely used for thermal management system\n design. However, the computational requirement of CFD limits its use for\n comprehensive system optimization and digital twin active control systems,\n which require quick, high-fidelity thermal field prediction. Data-driven\n surrogate models address the speed limitations of CFD but are often\n treated as black boxes whose predictions are physically uninterpretable.\n Neural networks (NN) combined with proper orthogonal decomposition (POD)\n maintain the speedup of other surrogate models while offering a physically\n interpretable latent space. The interpretability of POD-NN learned\n representations remains largely unexplored. This work develops and\n demonstrates a POD-NN surrogate model framework to predict 2D thermal\n fields for a liquid-cooled dual-chip cold plate with four retained POD\n modes. The resultant model predicts fields 412,500 faster than\n CFD. An error decomposition analysis is used to assess the contribution of\n POD-based and NN-based error across a range of dataset sizes and identify\n an error floor dependent on the number of modes retained rather than\n dataset size. A systematic Pearson correlation analysis is used to link\n all four POD coefficients to identifiable physical features, and thermal\n resistance is mapped in latent space. The presented framework enables\n sensitivity analysis, design optimization, and active thermal control\n within the physics-aware POD coefficient space, and advances POD-NN\n architectures from accelerators to interpretable engineering design tools."],"TechnicalInfo":["# Data from: Physically interpretable surrogate modeling of thermal fields\n in electronics cooling using combined proper orthogonal decomposition and\n neural networks Dataset DOI:\n [10.5061/dryad.k0p2ngfp5](https://doi.org/10.5061/dryad.k0p2ngfp5) ##\n Description of the data and file structure This dataset accompanies Curl\n & Hu (2026). It contains 2000 steady-state CFD simulations of a\n liquid-cooled, dual-chip, pin-fin cold plate run in ANSYS Fluent, the\n POD-NN surrogate trained on those simulations, and the source code that\n reproduces every figure in the paper from the published data. The DOE\n varies six boundary conditions (chip 1 / chip 2 surface heat fluxes, inlet\n 1 / inlet 2 mass flow rates, and inlet 1 / inlet 2 total temperatures)\n across the operational space of the cold plate using Latin Hypercube\n Sampling. The CFD model solves the steady, incompressible Navier-Stokes\n equations coupled with the energy equation and Menter's k-ω SST\n turbulence model on a ~50,000-element 3D grid; only data along the central\n yz mid-plane (11,110 nodes) is used in the published surrogate. Headline\n results from the paper: \\- POD with r=4 modes captures 99.32% of the\n temperature-field variance. \\- The trained POD-NN predicts a full 2D\n temperature field 412,500× faster than the underlying CFD solve. \\- A\n 100-sample training set is sufficient to reach the error floor; the\n iteration-cost crossover with CFD occurs at 100.5 design iterations (~2.76\n hrs). ### Files and variables File: **ned-009_AIEmulator.zip**\n Description: The zip file contains three folders (1_Dataset,\n 2_AnalyzedData, 3_SourceCode) and this README. #### The 1_Dataset folder\n contains: sim_0001.npz through sim_2000.npz: 2000 NumPy compressed\n archives, one per CFD simulation. The i-th file corresponds to the i-th\n boundary-condition row in model_setup.json. Open with `numpy.load(path,\n allow_pickle=True)`. Each archive contains 14 keys, all stored as float\n arrays:  - "yz-mid|temperature" — (11110,) temperature (K) on\n the central yz mid-plane (the field used in the paper)  -\n "yz-mid|coordinates" — (11110, 3) X/Y/Z node coordinates (m) for\n the yz mid-plane  - "zx-mid|temperature" — (40939,) temperature\n (K) on the zx mid-plane  - "zx-mid|coordinates" — (40939, 3)\n X/Y/Z node coordinates (m) for the zx mid-plane  -\n "bottom|temperature" — (3193,) temperature (K) on the\n bottom-face cooling boundary  - "bottom|coordinates" — (3193, 3)\n X/Y/Z node coordinates (m) for the bottom face  - Scalar outputs (each\n shape (1,)): "chip1_tavg|temperature",\n "chip1_tmax|temperature", "chip2_tavg|temperature",\n "chip2_tmax|temperature", "outlet_tavg|temperature",\n "outlet_tmax|temperature", "pdrop_1|temperature",\n "pdrop_2|temperature" model_setup.json: DOE configuration\n produced by the original ANSYS / PyFluent workflow. Required by every\n analysis script in 3_SourceCode/ — they read it to reconstruct the\n per-sample input parameter values. Top-level keys:  -\n "timestamp" — when the DOE was generated.  -\n "model_inputs" / "model_outputs" — metadata lists\n describing the parameter and output channels.  -\n "doe_configuration" — nested dict {boundary_name: {param_name:\n [value_for_sample_1, ..., value_for_sample_2000]}}. The four boundary\n names are chip1, chip2, inlet1, inlet2; their parameters are\n chip1.thermal.heat_flux, chip2.thermal.heat_flux,\n inlet1.momentum.mass_flow_rate, inlet1.thermal.total_temperature,\n inlet2.momentum.mass_flow_rate, inlet2.thermal.total_temperature.  -\n "case_file" — local path to the original ANSYS .cas.h5 file\n (informational only; the case file itself is not redistributed here). \n \n #### The 2_AnalyzedData folder contains: **trained_model**/: the\n canonical POD-NN surrogate referenced in the paper. Trained on the first\n 100 simulations (sim_0001..sim_0100), POD fit retains 4 modes, NN trained\n for up to 500 epochs with random seed 42, batch size 8, 80/20\n train/validation split, and last 400 simulations (sim_1601..sim_2000) held\n out for testing. Final test metrics: R² = 0.998, RMSE = 1.94 K, MAE = 1.46\n K.  - pod_nn.h5 — Keras / TensorFlow neural-network weights. Architecture:\n Input(6) → Dense(64, ReLU, L2=1e-3) → Dropout(0.1) → Dense(64, ReLU,\n L2=1e-3) → Dropout(0.1) → Dense(32, ReLU, L2=1e-3) → Dense(4). Optimizer\n Adam (lr=1e-3), loss MSE.  - pca.pkl — pickled sklearn PCA object: holds\n the 4 retained POD spatial modes and the snapshot mean.  -\n param_scaler.pkl — pickled sklearn StandardScaler for the 6 input\n parameters (mean / std fit on the training set).  - mode_scaler.pkl —\n pickled sklearn StandardScaler for the 4 POD coefficients (mean / std fit\n on the training-set POD scores).  - training_history.json — per-epoch\n train and validation MSE produced by Keras `model.fit`.\n dataset_size_sensitivity.csv (script 09) — raw MSE / Max AE / MSE_POD /\n MSE_NN numbers behind the sensitivity and decomposition plots. \n \n #### The 3_SourceCode folder contains: \\_utils.py — shared module: data\n loading, POD-NN build / train / predict, save and load of trained\n artifacts, thermal-resistance computation, common matplotlib styling. All\n hyperparameters live here as module constants (TRAIN_SIZE=100,\n TEST_SIZE=400, N_MODES=4, EPOCHS=500, RANDOM_SEED=42, CHIP_AREA=0.0016\n m²). 01_train_model.py — trains the canonical POD-NN and writes the\n artifacts in 2_AnalyzedData/trained_model/. Run this first.\n 02_pod_modes.py — POD spatial mode plots. 03_pod_reconstruction.py —\n original / 1-PC / 4-PC reconstruction quad plot. 04_pod_variance.py —\n cumulative variance vs mode count. 05_mse_history.py — train / val MSE\n curve from the saved training history. 06_time_crossover.py — analytical\n CFD-vs-POD-NN time crossover and per-component bar chart (no data load).\n 07_single_prediction.py — per-sample prediction / truth / error comparison\n plots. 08_mean_error_field.py — MAE field on the yz mid-plane across the\n test set. 09_dataset_size_sensitivity.py — sensitivity sweep (10\n trainings) + error decomposition; slow. 10_pod_alpha_sweep.py — POD mode\n α-sweep heatmaps and per-mode delta plot. 11_pearson_correlation.py — POD\n coefficient vs physical-quantity Pearson R heatmap. 12_rth_latent_space.py\n — Rth response surfaces in PC3-PC4. requirements.txt — minimum dependency\n list. ## Code/software The .npz files open with `numpy.load(path,\n allow_pickle=True)` (NumPy ≥ 1.20). The .pkl files open with Python's\n standard `pickle.load` and require scikit-learn to be installed in the\n same environment (they unpickle into sklearn PCA / StandardScaler\n instances). The .h5 file is a Keras model loadable with\n `tensorflow.keras.models.load_model(path, compile=False)`. To regenerate\n every figure from this dataset: 1\\. Create a Python ≥ 3.10 virtual\n environment and install the dependencies:    `pip install -r\n 3_SourceCode/requirements.txt` 2\\. From inside `3_SourceCode/`:    `python\n 01_train_model.py` — must run first; creates the trained model artifacts\n in 2_AnalyzedData/trained_model/.    Then run scripts 02 through 12 in any\n order. Script 09 trains 10 models and takes a few minutes; the others\n complete in seconds. Reproducibility note: every script seeds NumPy and\n TensorFlow with RANDOM_SEED=42 before training. The first 100 simulations\n are used for training, the last 400 for testing, and 80/20\n train/validation splitting is fixed. With the same Python / TensorFlow\n versions, repeated runs produce numerically identical results to those in\n 2_AnalyzedData/. **Related software (active development)** The current\n production version of this surrogate-modeling workflow is published at\n [https://github.com/UARK-NED3/CFDTwin](https://github.com/UARK-NED3/CFDTwin). CFDTwin is a wizard-driven desktop GUI that automates the same POD-NN pipeline end-to-end against a live ANSYS Fluent case (5-step flow: Setup → DOE → Simulate → Train → Validate). Important: the data format published here (sim_NNNN.npz + model_setup.json) is a snapshot from an earlier iteration of the codebase and is not directly loadable by the current CFDTwin release. The scripts in 3_SourceCode/ are self-contained and reproduce the paper's results from this static dataset. For new CFD cases, use CFDTwin against your own Fluent setup rather than retrofitting this dataset format."]} 
    more » « less
  3. {"Abstract":["Woodrats of the genus Neotoma are an important study system for ecological and paleoecological research. However, paleontological studies are often hindered by the difficulty of identifying woodrat remains to species. We address this limitation by using 2D landmark-based geometric morphometrics to classify 199 lower first molars (m1s) of five extant western North American Neotoma species (N. albigula, N. cinerea, N. fuscipes, N. lepida, and N. macrotis) collected throughout California. We then use discriminant analysis of principal components (DAPC) models to identify late Pleistocene fossils of unknown species from the Rancho La Brea Tar Pits in Los Angeles, CA. DAPC correctly identifies ~85-90% of extant individuals to species, with most misclassifications occurring between sister taxa N. fuscipes and N. macrotis. Most fossil m1s are classified as N. macrotis by DAPC, which may be the first confirmation of N. macrotis in the fossil record. We show that landmark-based geometric morphometric analyses are generally effective at differentiating m1s of extant Neotoma species in California and they are an auspicious method for unknown fossil identification. Further applications of this method across a broader range of geographic locations and species will better contextualize its utility."],"Other":["This file contains images and landmark coordinates of Neotoma lower dentition analyzed in Fox and Blois (2025) “A geometric morphometric approach to identifying recent and fossil woodrat molars with remarks on late Pleistocene Neotoma macrotis from Rancho La Brea”. The file contains three subfolders: "All Images", "Recent CA Neotoma", and "Unknowns". Each subfolder, described further below, contains .jpg specimen images and .tps files of 2D landmark coordinate data. All .tps files were built and combined in TpsUtil 32 (Rohlf, 2018a) with coordinate data digitized in TpsDig 2.32 (Rohlf, 2018b). Please see the manuscript text for a description of analyzes performed. Input data, analytical code, and output data/figures can be found in the Software section below, and at: https://github.com/bloispaleolab/Neotoma.m1.GM\n\nThe "All Images" folder contains images of all specimens analyzed and referenced in the manuscript (n = 229). It also contains two .tps files with landmark coordinate data obtained from lower first molars (m1s) of all specimens. These “Neotoma_combined_R1” and “Neotoma_combined_R2” files represent both landmarking repetitions of five extant Neotoma species from California (n = 199) combined with recent Neotoma cinerea specimens from Idaho and South Dakota (n = 13) and late Pleistocene fossil specimens from Project 23 deposits at Rancho La Brea (n = 16). Images and .tps files for each of those groups are also saved separately in the folders described below. Image files are named starting with genus and species first letter abbreviations (NX), then catalog number, then element (L= lower (mandible), LR = lower right (dentary)). Specimens are from the Museum of Vertebrate Zoology (MVZ) unless prefixed with P23 (Los Angeles County Museum of Paleontology, Rancho La Brea, Project 23) or SDSM (South Dakota School of Mines and Technology, Museum of Geology). For example, “NA_10467LR” = MVZ 10467 Neotoma albigula right dentary.\n\nThe “Recent CA Neotoma” folder contains images of recent Neotoma specimens from MVZ collected in California (n = 199). These images were obtained in November 2018 using a Dino-Lite Edge AM4815ZTL digital microscope. The folder also contains four .tps files: “Neotoma_extant_R1”, “Neotoma_extant_R2”, “WearVetted_R1”, and “WearVetted_R2”. The “Neotoma_extant” files contain landmark coordinate data for all 199 specimens. Twenty-eight of those specimens with very worn and nearly unworn molars were removed from the “WearVetted” files to evaluate wear-stage effects on tooth morphology. See manuscript Supplementary Table 1 for the list of removed specimens.\n\nThe “Unknowns” folder contains two subfolders: “East N.cinerea” and “LACMP23”. The “East N.cinerea” subfolder contains images of 12 Neotoma cinerea specimens from western South Dakota sampled from the SDSM Recent Vertebrates Collection (SDSM-SD-XXXXXX), and one specimen from Idaho sampled from MVZ (filename = “UCMVZ-ID-NC-51944L”). The SDSM specimens were photographed in November 2024 with a Dino-Lite Edge Plus AM4917MZT digital microscope and the MVZ specimen was photographed in November 2018 with a Dino-Lite Edge AM4815ZTL digital microscope. In this subfolder there is one .tps file with landmark coordinate data of all 13 specimens: “East_NC”. The “LACMP23” subfolder contains images of 16 late Pleistocene fossil m1s, and dentaries with m1s, from several Project 23 deposits at Rancho La Brea (P23_XXXXX). See manuscript Table 4 for the deposits each specimen was sampled from. Project 23 specimens were photographed between January and August of 2019 with a Dino-Lite Edge AM4815ZTL digital microscope. Left fossil elements were flipped to imitate right elements for analysis. This subfolder also contains one .tps file with landmark coordinate data of all 16 fossil specimens: “LACMP23”.\n\nReferences:\n\nRohlf, F.J., 2018a. TpsUtil version 1.76. Ecology & Evolution: (program), New York: Suny at Stony Brook.\n\nRohlf, F.J., 2018b. TpsDig version 2.31. Ecology & Evolution: (program), New York: Suny at Stony Brook."]} 
    more » « less
  4. ABSTRACT Human disturbance can have profound effects on biodiversity, including increasing hybridization between reproductively isolated species. One approach for understanding how human activity affects hybridization dynamics is to evaluate correlations between disturbance (e.g., urbanisation, temperature change) and hybridization. Because variation in hybridization can also arise from historical factors unrelated to recent human disturbance, it is essential to account for population structure to avoid spurious correlations. Here, we combine environmental and high‐coverage whole‐genome resequencing data to investigate how human disturbance and population structure affect hybridization dynamics between a pair of pine sawflies adapted to different pines,Neodiprion leconteiandNeodiprion pinetum. We find thatN. leconteiandN. pinetumexhibit strikingly different patterns of population structure, which we hypothesise stem from differences in host use. We also find that recent admixture is both asymmetric and geographically variable. Linear regression analyses reveal that admixture proportion is predicted by indirect human disturbance (i.e., climate change) and not direct human disturbance (e.g., urbanisation) in bothN. leconteiandN. pinetum. Lastly, inN. pinetum, we find evidence of a spurious association between admixture and direct human disturbance that disappears when regression models account for population structure via inclusion of genetic principal component scores as covariates. Together, our data suggest that indirect human disturbance and population structure both contribute to geographic variation in admixture betweenN. leconteiandN. pinetum. Our study also highlights the importance of adequately controlling for population structure when attempting to identify environmental predictors (human disturbance‐related or not) of hybridization. 
    more » « less
  5. {"Abstract":["This repository contains data and code for reproducing the results of the worked example in Smeds et al. 2025 "lacunr: Efficient 3-D lacunarity for voxelized LiDAR data from forested ecosystems", published in Methods in Ecology and Evolution (https://doi.org/10.1111/2041-210X.70126).\n\nThe worked example analyzes terrestrial LiDAR scans of two forest stands at the Saddle Mountain Open Space Preserve in Sonoma County, California, which were burned in the 2020 Glass Fire. The first of these scans, Plot 1, in included in the 'lacunr' software package as an example dataset, while the second plot is archived here. The included R script allows users to reproduce the lacunarity curves displayed in Figure 2 and Figure S1 of Smeds et al.\n\nThe deposited zip archive, lacunr_example.zip, can be decompressed into a corresponding folder with the following contents:\n\n\n\ndata/ - a data folder which contains two height-normalized terrestrial LiDAR point clouds in .laz format:\n\n\n\nc6_tls_p6_prefire.laz - 24*24m forest stand before wildfire\n\nc10_tls_p6_postfire.laz - the same forest stand after wildfire\n\n\n\nlacunr_example_data.R - R script for replicating the lacunarity curves presented in the publication\n\nlacunr_example.Rproj - the R project file which allows users to open the R script in a self-contained environment. This file should be opened in RStudio prior to executing the R script\n\noutput/ - the folder where external image files are exported by the R script\n\nREADME.txt - a text file containing instructions for opening the R project and running the associated code"]} 
    more » « less