skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Hemstrom, William"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Genomic data are ubiquitous across disciplines, from agriculture to biodiversity, ecology, evolution and human health. However, these datasets often contain noise or errors and are missing information that can affect the accuracy and reliability of subsequent computational analyses and conclusions. A key step in genomic data analysis is filtering — removing sequencing bases, reads, genetic variants and/or individuals from a dataset — to improve data quality for downstream analyses. Researchers are confronted with a multitude of choices when filtering genomic data; they must choose which filters to apply and select appropriate thresholds. To help usher in the next generation of genomic data filtering, we review and suggest best practices to improve the implementation, reproducibility and reporting standards for filter types and thresholds commonly applied to genomic datasets. We focus mainly on filters for minor allele frequency, missing data per individual or per locus, linkage disequilibrium and Hardy–Weinberg deviations. Using simulated and empirical datasets, we illustrate the large effects of different filtering thresholds on common population genetics statistics, such as Tajima’s D value, population differentiation (FST), nucleotide diversity (π) and effective population size (Ne). 
    more » « less
  2. Abstract Signals of natural selection can be quickly eroded in high gene flow systems, curtailing efforts to understand how and when genetic adaptation occurs in the ocean. This long‐standing, unresolved topic in ecology and evolution has renewed importance because changing environmental conditions are driving range expansions that may necessitate rapid evolutionary responses. One example occurs in Kellet's whelk (Kelletia kelletii), a common subtidal gastropod with an ~40‐ to 60‐day pelagic larval duration that expanded their biogeographic range northwards in the 1970s by over 300 km. To test for genetic adaptation, we performed a series of experimental crosses with Kellet's whelk adults collected from their historical (HxH) and recently expanded range (ExE), and conducted RNA‐Seq on offspring that we reared in a common garden environment. We identified 2770 differentially expressed genes (DEGs) between 54 offspring samples with either only historical range (HxH offspring) or expanded range (ExE offspring) ancestry. Using SNPs called directly from the DEGs, we assigned samples of known origin back to their range of origin with unprecedented accuracy for a marine species (92.6% and 94.5% for HxH and ExE offspring, respectively). The SNP with the highest predictive importance occurred on triosephosphate isomerase (TPI), an essential metabolic enzyme involved in cold stress response.TPIwas significantly upregulated and contained a non‐synonymous mutation in the expanded range. Our findings pave the way for accurately identifying patterns of dispersal, gene flow and population connectivity in the ocean by demonstrating that experimental transcriptomics can reveal mechanisms for how marine organisms respond to changing environmental conditions. 
    more » « less
  3. ABSTRACT High‐grading bias is the overestimation power in a subset of loci caused by model overfitting. Using both empirical and simulated datasets, we show that high‐grading bias can cause severe overestimation of population structure, and thus mislead investigators, whenever highly informative or high‐FSTmarkers are chosen (i.e., ascertained) and used for subsequent assessments, a common practice in population genetic studies. This problem can occur in panmictic populations with no local adaptation.Biased results from choosing high‐FSTmarkers may have severe downstream implications for management and conservation, such as erroneous conservation unit delineation, which could squander limited conservation resources to protect incorrectly defined ‘populations’. Furthermore, we caution that high‐grading is not limited toFSTapproaches; high‐grading bias is a concern whenever a small subset of markers are first chosen to explain differences among groups based on their degree of difference and are subsequently reused to estimate the degree of difference among those groups. For example, selecting highFSTloci for use in a GT‐seq panel or using differentially expressed genes to plot sample membership in multivariate space can both result in spurious structure when none exists. We illustrate that using statistically based outlier tests in place of arbitraryFSTcut‐offs can reduce bias. Alternatively, permutation tests or cross‐evaluation can be used to detect high‐grading bias. We provide an R package, PCAssess, to help researchers detect and prevent high‐grading bias in genetic datasets by automating permutation tests and principal component analyses (https://github.com/hemstrow/PCAssess). 
    more » « less