ABSTRACT Hybrid zones, where genetically distinct groups of organisms meet and interbreed, offer valuable insights into the nature of species and speciation. Here, we present a new R package,bgchm, for population genomic analyses of hybrid zones. This R package extends and updates the existingbgcsoftware and combines Bayesian analyses of hierarchical genomic clines with Bayesian methods for estimating hybrid indexes, interpopulation ancestry proportions, and geographic clines. Compared to existing software,bgchmoffers enhanced efficiency through Hamiltonian Monte Carlo sampling and the ability to work with genotype likelihoods combined with a hierarchical Bayesian approach, enabling inference for diverse types of genetic data sets. The package also facilitates the quantification of introgression patterns across genomes, which is crucial for understanding reproductive isolation and speciation genetics. We first describe the models underlyingbgchmand then provide an overview of the R package and illustrate its use through the analysis of simulated and empirical data sets. We show thatbgchmgenerates accurate estimates of model parameters under a variety of conditions, especially when the genetic loci analyzed are highly ancestry informative. This includes relatively robust estimates of genome‐wide variability in clines, which has not been the focus of previous models and methods. We also illustrate how both selection and genetic drift contribute to variability in introgression among loci and how additional information can be used to help distinguish these contributions. We conclude by describing the promises and limitations ofbgchm, comparingbgchmto other software for genomic cline analyses, and identifying areas for fruitful future development.
more »
« less
ClineHelpR: an R package for genomic cline outlier detection and visualization
Abstract BackgroundPatterns of multi-locus differentiation (i.e., genomic clines) often extend broadly across hybrid zones and their quantification can help diagnose how species boundaries are shaped by adaptive processes, both intrinsic and extrinsic. In this sense, the transitioning of loci across admixed individuals can be contrasted as a function of the genome-wide trend, in turn allowing an expansion of clinal theory across a much wider array of biodiversity. However, computational tools that serve to interpret and consequently visualize ‘genomic clines’ are limited, and users must often write custom, relatively complex code to do so. ResultsHere, we introduce the ClineHelpR R-package for visualizing genomic clines and detecting outlier loci using output generated by two popular software packages, bgc and Introgress. ClineHelpR bundles both input generation (i.e., filtering datasets and creating specialized file formats) and output processing (e.g., MCMC thinning and burn-in) with functions that directly facilitate interpretation and hypothesis testing. Tools are also provided for post-hoc analyses that interface with external packages such as ENMeval and RIdeogram. ConclusionsOur package increases the reproducibility and accessibility of genomic cline methods, thus allowing an expanded user base and promoting these methods as mechanisms to address diverse evolutionary questions in both model and non-model organisms. Furthermore, the ClineHelpR extended functionality can evaluate genomic clines in the context of spatial and environmental features, allowing users to explore underlying processes potentially contributing to the observed patterns and helping facilitate effective conservation management strategies.
more »
« less
- Award ID(s):
- 2010774
- PAR ID:
- 10306901
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- BMC Bioinformatics
- Volume:
- 22
- Issue:
- 1
- ISSN:
- 1471-2105
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract PremiseDigitized biodiversity data offer extensive information; however, obtaining and processing biodiversity data can be daunting. Complexities arise during data cleaning, such as identifying and removing problematic records. To address these issues, we created the R package Geographic And Taxonomic Occurrence R‐based Scrubbing (gatoRs). Methods and ResultsThe gatoRs workflow includes functions that streamline downloading records from the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio). We also created functions to clean downloaded specimen records. Unlike previous R packages, gatoRs accounts for differences in download structure between GBIF and iDigBio and allows for user control via interactive cleaning steps. ConclusionsOur pipeline enables the scientific community to process biodiversity data efficiently and is accessible to the R coding novice. We anticipate that gatoRs will be useful for both established and beginning users. Furthermore, we expect our package will facilitate the introduction of biodiversity‐related concepts into the classroom via the use of herbarium specimens.more » « less
-
Abstract PremiseRubiaceae is among the most species‐rich plant families, as well as one of the most morphologically and geographically diverse. Currently available phylogenies have mostly relied on few genomic and plastid loci, as opposed to large‐scale genomic data. Target enrichment provides the ability to generate sequence data for hundreds to thousands of phylogenetically informative, single‐copy loci, which often leads to improved phylogenetic resolution at both shallow and deep taxonomic scales; however, a publicly accessible Rubiaceae‐specific probe set that allows for comparable phylogenetic inference across clades is lacking. MethodsHere, we use publicly accessible genomic resources to identify putatively single‐copy nuclear loci for target enrichment in two Rubiaceae groups: tribe Hillieae (Cinchonoideae) and tribal complex Palicoureeae+Psychotrieae (Rubioideae). We sequenced 2270 exonic regions corresponding to 1059 loci in our target clades and generated in silico target enrichment sequences for other Rubiaceae taxa using our designed probe set. To test the utility of our probe set for phylogenetic inference across Rubiaceae, we performed a coalescent‐aware phylogenetic analysis using a subset of 27 Rubiaceae taxa from 10 different tribes and three subfamilies, and one outgroup in Apocynaceae. ResultsWe recovered an average of 75% and 84% of targeted exons and loci, respectively, per Rubiaceae sample. Probes designed using genomic resources from a particular subfamily were most efficient at targeting sequences from taxa in that subfamily. The number of paralogs recovered during assembly varied for each clade. Phylogenetic inference of Rubiaceae with our target regions resolves relationships at various scales. Relationships are largely consistent with previous studies of relationships in the family with high support (≥0.98 local posterior probability) at nearly all nodes and evidence of gene tree discordance. DiscussionOur probe set, which we call Rubiaceae2270x, was effective for targeting loci in species across and even outside of Rubiaceae. This probe set will facilitate phylogenomic studies in Rubiaceae and advance systematics and macroevolutionary studies in the family.more » « less
-
null (Ed.)Abstract Background Advances in genotyping and phenotyping techniques have enabled the acquisition of a great amount of data. Consequently, there is an interest in multivariate statistical analyses that identify genomic regions likely to contain causal mutations affecting multiple traits (i.e., pleiotropy). As the demand for multivariate analyses increases, it is imperative that optimal tools are available to assess their performance. To facilitate the testing and validation of these multivariate approaches, we developed simplePHENOTYPES, an R/CRAN package that simulates pleiotropy, partial pleiotropy, and spurious pleiotropy in a wide range of genetic architectures, including additive, dominance and epistatic models. Results We illustrate simplePHENOTYPES’ ability to simulate thousands of phenotypes in less than one minute. We then provide two vignettes illustrating how to simulate sets of correlated traits in simplePHENOTYPES. Finally, we demonstrate the use of results from simplePHENOTYPES in a standard GWAS software, as well as the equivalence of simulated phenotypes from simplePHENOTYPES and other packages with similar capabilities. Conclusions simplePHENOTYPES is a R/CRAN package that makes it possible to simulate multiple traits controlled by loci with varying degrees of pleiotropy. Its ability to interface with both commonly-used marker data formats and downstream quantitative genetics software and packages should facilitate a rigorous assessment of both existing and emerging statistical GWAS and GS approaches. simplePHENOTYPES is also available at https://github.com/samuelbfernandes/simplePHENOTYPES .more » « less
-
PremiseDivergence depends on the strength of selection and frequency of gene flow between taxa, while reproductive isolation relies on mating barriers and geographic distance. Less is known about how these processes interact at early stages of speciation. Here, we compared population‐level differentiation in floral phenotype and genetic sequence variation among recently divergedCastillejato explore patterns of diversification under different scenarios of reproductive isolation. MethodsUsing target enrichment enabled by the Angiosperms353 probe set, we assessed genetic distance among 50 populations of fourCastillejaspecies. We investigated whether patterns of genetic divergence are explained by floral trait variation or geographic distance in two focal groups: the widespreadC. sessilifloraand the more restrictedC. purpureaspecies complex. ResultsWe document thatC. sessilifloraand theC. purpureacomplex are characterized by high diversity in floral color across varying geographic scales. Despite phenotypic divergence, groups were not well supported in phylogenetic analyses, and little genetic differentiation was found across targeted Angiosperms353 loci. Nonetheless, a principal coordinate analysis of single nucleotide polymorphisms revealed differentiation withinC. sessilifloraacross floral morphs and geography and less differentiation among species of theC. purpureacomplex. ConclusionsPatterns of genetic distance inC. sessiliflorasuggest species cohesion maintained over long distances despite variation in floral traits. In theC. purpureacomplex, divergence in floral color across narrow geographic clines may be driven by recent selection on floral color. These contrasting patterns of floral and genetic differentiation reveal that divergence can arise via multiple eco‐evolutionary paths.more » « less