Abstract Spatial biases are an intrinsic feature of occurrence data used in species distribution models (SDMs). Thinning species occurrences, where records close in the geographic or environmental space are removed from the modeling procedure, is an approach often used to address these biases. However, thinning occurrence data can also negatively affect SDM performance, given that the benefits of removing spatial biases might be outweighed by the detrimental effects of data loss caused by this approach. We used real and virtual species to evaluate how spatial and environmental thinning affected different performance metrics of four SDM methods. The occurrence data of virtual species were sampled randomly, evenly spaced, and clustered in the geographic space to simulate different types of spatial biases, and several spatial and environmental thinning distances were used to thin the occurrence data. Null datasets were also generated for each thinning distance where we randomly removed the same number of occurrences by a thinning distance and compared the results of the thinned and null datasets. We found that spatially or environmentally thinned occurrence data is no better than randomly removing them, given that thinned datasets performed similarly to null datasets. Specifically, spatial and environmental thinning led to a general decrease in model performances across all SDM methods. These results were observed for real and virtual species, were positively associated with thinning distance, and were consistent across the different types of spatial biases. Our results suggest that thinning occurrence data usually fails to improve SDM performance and that the use of thinning approaches when modeling species distributions should be considered carefully.
more »
« less
The community ecology perspective of omics data
Abstract The measurement of uncharacterized pools of biological molecules through techniques such as metabarcoding, metagenomics, metatranscriptomics, metabolomics, and metaproteomics produces large, multivariate datasets. Analyses of these datasets have successfully been borrowed from community ecology to characterize the molecular diversity of samples ( ɑ -diversity) and to assess how these profiles change in response to experimental treatments or across gradients ( β -diversity). However, sample preparation and data collection methods generate biases and noise which confound molecular diversity estimates and require special attention. Here, we examine how technical biases and noise that are introduced into multivariate molecular data affect the estimation of the components of diversity (i.e., total number of different molecular species, or entities; total number of molecules; and the abundance distribution of molecular entities). We then explore under which conditions these biases affect the measurement of ɑ - and β -diversity and highlight how novel methods commonly used in community ecology can be adopted to improve the interpretation and integration of multivariate molecular data.
more »
« less
- PAR ID:
- 10425436
- Date Published:
- Journal Name:
- Microbiome
- Volume:
- 10
- Issue:
- 1
- ISSN:
- 2049-2618
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Various methods exist for measuring molecular orientation, thereby providing insight into biochemical activities at nanoscale. Since fluorescence intensity and not electric field is detected, these methods are limited to measuring even-order moments of molecular orientation. However, any measurement noise, for example photon shot noise, will result in nonzero measurements of any of these even-order moments, thereby causing rotationally-free molecules to appear to be partially constrained. Here, we build a model to quantify measurement errors in rotational mobility. Our theoretical framework enables scientists to choose the optimal single-molecule orientation measurement technique for any desired measurement accuracy and photon budget.more » « less
-
Abstract The number and diversity of phenological studies has increased rapidly in recent years. Innovative experiments, field studies, citizen science projects, and analyses of newly available historical data are contributing insights that advance our understanding of ecological and evolutionary responses to the environment, particularly climate change. However, many phenological data sets have peculiarities that are not immediately obvious and can lead to mistakes in analyses and interpretation of results. This paper aims to help researchers, especially those new to the field of phenology, understand challenges and practices that are crucial for effective studies. For example, researchers may fail to account for sampling biases in phenological data, struggle to choose or design a volunteer data collection strategy that adequately fits their project’s needs, or combine data sets in inappropriate ways. We describe ten best practices for designing studies of plant and animal phenology, evaluating data quality, and analyzing data. Practices include accounting for common biases in data, using effective citizen or community science methods, and employing appropriate data when investigating phenological mismatches. We present these best practices to help researchers entering the field take full advantage of the wealth of available data and approaches to advance our understanding of phenology and its implications for ecology.more » « less
-
null (Ed.)Biodiversity science encompasses multiple disciplines and biological scales from molecules to landscapes. Nevertheless, biodiversity data are often analyzed separately with discipline‐specific methodologies, constraining resulting inferences to a single scale. To overcome this, we present a topic modeling framework to analyze community composition in cross‐disciplinary datasets, including those generated from metagenomics, metabolomics, field ecology and remote sensing. Using topic models, we demonstrate how community detection in different datasets can inform the conservation of interacting plants and herbivores. We show how topic models can identify members of molecular, organismal and landscape‐level communities that relate to wildlife health, from gut microbes to forage quality. We conclude with a future vision for how topic modeling can be used to design cross‐scale studies that promote a holistic approach to detect, monitor and manage biodiversity.more » « less
-
Genomic data are ubiquitous across disciplines, from agriculture to biodiversity, ecology, evolution and human health. However, these datasets often contain noise or errors and are missing information that can affect the accuracy and reliability of subsequent computational analyses and conclusions. A key step in genomic data analysis is filtering — removing sequencing bases, reads, genetic variants and/or individuals from a dataset — to improve data quality for downstream analyses. Researchers are confronted with a multitude of choices when filtering genomic data; they must choose which filters to apply and select appropriate thresholds. To help usher in the next generation of genomic data filtering, we review and suggest best practices to improve the implementation, reproducibility and reporting standards for filter types and thresholds commonly applied to genomic datasets. We focus mainly on filters for minor allele frequency, missing data per individual or per locus, linkage disequilibrium and Hardy–Weinberg deviations. Using simulated and empirical datasets, we illustrate the large effects of different filtering thresholds on common population genetics statistics, such as Tajima’s D value, population differentiation (FST), nucleotide diversity (π) and effective population size (Ne).more » « less
An official website of the United States government

