Background Large (>1 Mb), polymorphic inversions have substantial impacts on population structure and maintenance of genotypes. These large inversions can be detected from single nucleotide polymorphism (SNP) data using unsupervised learning techniques like PCA. Construction and analysis of a feature matrix from millions of SNPs requires large amount of memory and limits the sizes of data sets that can be analyzed. Methods We propose using feature hashing construct a feature matrix from a VCF file of SNPs for reducing memory usage. The matrix is constructed in a streaming fashion such that the entire VCF file is never loaded into memory at one time. Results When evaluated on Anopheles mosquito and Drosophila fly data sets, our approach reduced memory usage by 97% with minimal reductions in accuracy for inversion detection and localization tasks. Conclusion With these changes, inversions in larger data sets can be analyzed easily and efficiently on common laptop and desktop computers. Our method is publicly available through our open-source inversion analysis software, Asaph.
more »
« less
Segmenting and Genotyping Large, Polymorphic Inversions
Large, polymorphic inversions can contribute to population structure and enable mutually-exclusive adaptations to survive in the same population. Current methods for detecting inversions from single-nucleotide polymorphisms (SNPs) called from population genomics data require an experienced, human user to prepare the data and interpret the results. Ideally, these methods would be completely automated yet robust to allow usage by inexperienced users. Towards this goal, automated approaches for segmentation of inversions and inference of sample genotypes are introduced and evaluated on chromosomes from flies, mosquitoes, and prairie sunflowers.
more »
« less
- Award ID(s):
- 1947257
- PAR ID:
- 10463483
- Date Published:
- Journal Name:
- 2023 IEEE International Conference on Electro Information Technology (eIT)
- Page Range / eLocation ID:
- 153 to 162
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Across many species where inversions have been implicated in local adaptation, genomes often evolve to contain multiple, large inversions that arise early in divergence. Why this occurs has yet to be resolved. To address this gap, we built forward-time simulations in which inversions have flexible characteristics and can invade a metapopulation undergoing spatially divergent selection for a highly polygenic trait. In our simulations, inversions typically arose early in divergence, captured standing genetic variation upon mutation, and then accumulated many small-effect loci over time. Under special conditions, inversions could also arise late in adaptation and capture locally adapted alleles. Polygenic inversions behaved similarly to a single supergene of large effect and were detectable by genome scans. Our results show that characteristics of adaptive inversions found in empirical studies (e.g. multiple large, old inversions that are F ST outliers, sometimes overlapping with other inversions) are consistent with a highly polygenic architecture, and inversions do not need to contain any large-effect genes to play an important role in local adaptation. By combining a population and quantitative genetic framework, our results give a deeper understanding of the specific conditions needed for inversions to be involved in adaptation when the genetic architecture is polygenic. This article is part of the theme issue ‘Genomic architecture of supergenes: causes and evolutionary consequences’.more » « less
-
Abstract Large structural variants in the genome, such as inversions, may play an important role in producing population structure and local adaptation to the environment through suppression of recombination. However, relatively few studies have linked inversions to phenotypic traits that are sexually selected and may play a role in reproductive isolation. Here, we found that geographic differences in the sexually selected plumage of a warbler, the common yellowthroat (Geothlypis trichas), are largely due to differences in the Z (sex) chromosome (males are ZZ), which contains at least one putative inversion spanning 40% (31/77 Mb) of its length. The inversions on the Z chromosome vary dramatically east and west of the Appalachian Mountains, which provides evidence of cryptic population structure within the range of the most widespread eastern subspecies (G. t. trichas). In an eastern (New York) and western (Wisconsin) population of this subspecies, female prefer different male ornaments; larger black facial masks are preferred in Wisconsin and larger yellow breasts are preferred in New York. The putative inversion also contains genes related to vision, which could influence mating preferences. Thus, structural variants on the Z chromosome are associated with geographic differences in male ornaments and female choice, which may provide a mechanism for maintaining different patterns of sexual selection in spite of gene flow between populations of the same subspecies.more » « less
-
Corbett-Detig, Russell (Ed.)The ability of genomic inversions to reduce recombination and generate linkage can have a major impact on genetically based phenotypic variation in populations. However, the increase in linkage associated with inversions can create hurdles for identifying associations between loci linked to inversions and the traits they impact. Therefore, the role of inversions in mediating genetic variation of complex traits remains to be fully understood. This study uses the fruit flyDrosophila melanogasterto investigate the impact of inversions on trait variation. We tested the effects of common inversions among a diverse assemblage of traits including aspects of behavior, morphology, and physiology, and identified that the cosmopolitan inversions In(2L)t and In(3R)Mo are associated with many traits. We compared the ability of different approaches of accounting for relatedness and inversion presence during genome-wide association to identify signals of association with SNPs. We report that commonly used association methods are underpowered within inverted regions, while alternative approaches such as leave-one-chromosome-out improve the ability to identify associations. In all, our research enhances our understanding of inversions as components of trait variation and provides insight into approaches for identifying genomic regions driving these associations.more » « less
-
Abstract We developed an open source, extensible Python‐based framework, that we call the Versatile Modeling Of Deformation (VMOD), for forward and inverse modeling of crustal deformation sources. VMOD abstracts from specific source model implementations, data types and inversion methods. We implement the most common geodetic source models which can be combined to model and analyze multi‐source deformation. VMOD supports Global Navigation Satellite System (GNSS), InSAR, electronic distance measurement, Leveling and tilt data. To infer source characteristics from observations, VMOD implements non‐linear least squares and Markov Chain Monte‐Carlo Bayesian inversions, including joint inversions using different sources of data. VMOD's structure allows for easy integration of new geodetic models, data types, and inversion strategies. We benchmark the forward models against other published results and the inversion approaches against other implementations. We apply VMOD to analyze deformation at Unimak Island, Alaska, observed with continuous and campaign GNSS, and ascending and descending InSAR time series generated from Sentinel‐1 satellite radar acquisitions. These data show an inflation pattern at Westdahl volcano and subsidence at Fisher Caldera. We use VMOD to test a range of source models by jointly inverting the GNSS and InSAR data sets. Our final model simultaneously constrains the parameters of two sources. Our results reveal a depressurizing spheroid under Fisher Caldera ∼4–6 km deep, contracting at a rate of ∼2–3 Mm3/yr, and a pressurizing spherical source underneath Westdahl volcano ∼6–8 km deep, inflating at ∼5 Mm3/yr. This and past applications of VMOD to volcanic unrest benefit from an extensible framework which supports jointly inversions of data sets for parameters of easily composable multi‐source models.more » « less
An official website of the United States government

