skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 7:00 AM ET to 7:30 AM ET on Friday, April 24 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Geib, Scott M"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Hi-C characterizes three-dimensional chromatin organization, facilitates haplotype phasing, and enables genome-assembly scaffolding, but encounters difficulties across complex regions. By coupling chromosome conformation capture (3C) with PacBio HiFi long-read sequencing, here we develop a method (CiFi) that enables analysis of genomic interactions across repetitive regions. Starting with as little as 60,000 cells (sub-microgram DNA), the method produces multi-kilobasepair HiFi reads that contain multiple interacting, concatenated segments (~350 bp to 2 kbp). This multiplicity and increase in segment length versus standard short-read-based Hi-C improves read-mapping efficiency and coverage in repetitive regions and enhances haplotype phasing. CiFi pairwise interactions are largely concordant with Hi-C from a human lymphoblastoid cell line, with gains in assigning topologically associating domains across centromeres, segmental duplications, and human disease-associated genomic hotspots. As CiFi requires less input versus established methods, we apply the approach to characterize single small insects: assaying chromatin interactions across the genome from an Anopheles coluzzii mosquito and producing a chromosome-scale scaffolded assembly from a Ceratitis capitata Mediterranean fruit fly. Together, CiFi enables assessment of chromosome-scale interactions of previously recalcitrant low-complexity loci, low-input samples, and small organisms. 
    more » « less
  2. Vogel, K (Ed.)
    Abstract We present the first chromosome-level genome assembly for Bombus pensylvanicus, a historically widespread native pollinator species that was distributed across eastern North America but has subsequently undergone declines in range area and local relative abundance. This species has been of significant interest as a model for understanding both patterns and possible causes of bumble bee decline in the region, including the role of genetic variation. Here we present a chromosome-level reference genome assembled using Pacific Biosciences singe-molecule HiFi sequences and Hi-C data and annotated using evidence derived from RNA sequencing of multiple tissue types. The B. pensylvanicus genome has a total length of ∼352.6 Mb and was assembled into a total of 224 scaffolds, with 19 primary pseudomolecules representing putative chromosomes and an N50 = 14.872 Mb. Annotation with the Eukaryotic Genome Annotation Pipeline—External (EGAPx) identified 11,411 genes (10,263 protein coding), and BUSCO analysis of 5,991 Hymenoptera-specific BUSCO groups indicated a completeness for the proteins of 99.0% (98.6% single-copy, 0.5% duplicated) and for the genome of 98.5% (98.2% single-copy, 0.3% duplicated). We present synteny analyses with other recently assembled Bombus genomes representing different subgenera and examine the distribution of repetitive regions of the genome relative to the distribution of genes and noncoding RNAs. 
    more » « less
  3. Vogel, K (Ed.)
    Abstract The Hunt bumble bee, Bombus huntii, is a widely distributed pollinator in western North America. The species produces large colony sizes in captive rearing conditions, experiences low parasite and pathogen loads, and has been demonstrated to be an effective pollinator of tomatoes grown in controlled environment agriculture systems. These desirable traits have galvanized producer efforts to develop commercial Bombus huntii colonies for growers to deliver pollination services to crops. To better understand Bombus huntii biology and support population genetic studies and breeding decisions, we sequenced and assembled the Bombus huntii genome from a single haploid male. High-fidelity sequencing of the entire genome using PacBio, along with HiC sequencing, led to a comprehensive contig assembly of high continuity. This assembly was further organized into a chromosomal arrangement, successfully identifying 18 chromosomes spread across the 317.4 Mb assembly with a BUSCO score indicating 97.6% completeness. Synteny analysis demonstrates shared chromosome number (n = 18) with Bombus terrestris, a species belonging to a different subgenus, matching the expectation that presence of 18 haploid chromosomes is an ancestral trait at least between the subgenera Pyrobombus and Bombus sensu stricto. In conclusion, the assembly outcome, alongside the minimal tissue sampled destructively, showcases efficient techniques for producing a comprehensive, highly contiguous genome. 
    more » « less
  4. Abstract Understanding the genetics of adaptation and speciation is critical for a complete picture of how biodiversity is generated and maintained. Heterogeneous genomic differentiation between diverging taxa is commonly documented, with genomic regions of high differentiation interpreted as resulting from differential gene flow, linked selection and reduced recombination rates. Disentangling the roles of each of these non‐exclusive processes in shaping genome‐wide patterns of divergence is challenging but will enhance our knowledge of the repeatability of genomic landscapes across taxa. Here, we combine whole‐genome resequencing and genome feature data to investigate the processes shaping the genomic landscape of differentiation for a sister‐species pair of haplodiploid pine sawflies,Neodiprion leconteiandNeodiprion pinetum. We find genome‐wide correlations between genome features and summary statistics are consistent with pervasive linked selection, with patterns of diversity and divergence more consistently predicted by exon density and recombination rate than the neutral mutation rate (approximated by dS). We also find that both global and local patterns ofFST,dXYandπprovide strong support for recurrent selection as the primary selective process shaping variation across pine sawfly genomes, with some contribution from balancing selection and lineage‐specific linked selection. Because inheritance patterns for haplodiploid genomes are analogous to those of sex chromosomes, we hypothesize that haplodiploids may be especially prone to recurrent selection, even if gene flow occurred throughout divergence. Overall, our study helps fill an important taxonomic gap in the genomic landscape literature and contributes to our understanding of the processes that shape genome‐wide patterns of genetic variation. 
    more » « less
  5. Comparative genomic studies of social insects suggest that changes in gene regulation are associated with evolutionary transitions in social behavior, but the activity of predicted regulatory regions has not been tested empirically. We used STARR-seq, a high-throughput enhancer discovery tool, to identify and measure the activity of enhancers in the socially variable sweat bee,Lasioglossum albipes. We identified over 36,000 enhancers in theL. albipesgenome from three social and three solitary populations. Many enhancers were identified in only a subset ofL. albipespopulations, revealing rapid divergence in regulatory regions within this species. Population-specific enhancers were often proximal to the same genes across populations, suggesting compensatory gains and losses of regulatory regions may preserve gene activity. We also identified 1182 enhancers with significant differences in activity between social and solitary populations, some of which are conserved regulatory regions across species of bees. These results indicate that social trait variation inL. albipesis driven both by the fine-tuning of ancient enhancers as well as lineage-specific regulatory changes. Combining enhancer activity with population genetic data revealed variants associated with differences in enhancer activity and identified a subset of differential enhancers with signatures of selection associated with social behavior. Together, these results provide the first empirical map of enhancers in a socially flexible bee and highlight links between cis-regulatory variation and the evolution of social behavior. 
    more » « less
  6. Hahn, Matthew (Ed.)
    Abstract Rapidly evolving taxa are excellent models for understanding the mechanisms that give rise to biodiversity. However, developing an accurate historical framework for comparative analysis of such lineages remains a challenge due to ubiquitous incomplete lineage sorting (ILS) and introgression. Here, we use a whole-genome alignment, multiple locus-sampling strategies, and summary-tree and single nucleotide polymorphism-based species-tree methods to infer a species tree for eastern North American Neodiprion species, a clade of pine-feeding sawflies (Order: Hymenopteran; Family: Diprionidae). We recovered a well-supported species tree that—except for three uncertain relationships—was robust to different strategies for analyzing whole-genome data. Nevertheless, underlying gene-tree discordance was high. To understand this genealogical variation, we used multiple linear regression to model site concordance factors estimated in 50-kb windows as a function of several genomic predictor variables. We found that site concordance factors tended to be higher in regions of the genome with more parsimony-informative sites, fewer singletons, less missing data, lower GC content, more genes, lower recombination rates, and lower D-statistics (less introgression). Together, these results suggest that ILS, introgression, and genotyping error all shape the genomic landscape of gene-tree discordance in Neodiprion. More generally, our findings demonstrate how combining phylogenomic analysis with knowledge of local genomic features can reveal mechanisms that produce topological heterogeneity across genomes. 
    more » « less
  7. Lawniczak, M (Ed.)
    Abstract The parasitoid wasp Venturia canescens is an important biological control agent of stored products moth pests and serves as a model to study the function and evolution of domesticated endogenous viruses (DEVs). The DEVs discovered in V. canescens are known as virus-like particles (VcVLPs), which are produced using nudivirus-derived components and incorporate wasp-derived virulence proteins instead of packaged nucleic acids. Previous studies of virus-derived components in the V. canescens genome identified 53 nudivirus-like genes organized in six gene clusters and several viral pseudogenes, but how VcVLP genes are organized among wasp chromosomes following their integration in the ancestral wasp genome is largely unknown. Here, we present a chromosomal scale genome of V. canescens consisting of 11 chromosomes and 56 unplaced small scaffolds. The genome size is 290.8 Mbp with a N50 scaffold size of 24.99 Mbp. A high-quality gene set including 11,831 protein-coding genes were produced using RNA-Seq data as well as publicly available peptide sequences from related Hymenoptera. A manual annotation of genes of viral origin produced 61 intact and 19 pseudogenized nudivirus-derived genes. The genome assembly revealed that two previously identified clusters were joined into a single cluster and a total of 5 gene clusters comprising of 60 intact nudivirus-derived genes were located in three chromosomes. In contrast, pseudogenes are dispersed among 8 chromosomes with only 4 pseudogenes associated with nudivirus gene clusters. The architecture of genes encoding VcVLP components suggests it originates from a recent virus acquisition and there is a link between the processes of dispersal and pseudogenization. This high-quality genome assembly and annotation represents the first chromosome-scale assembly for parasitoid wasps associated with VLPs, and is publicly available in the National Center for Biotechnology Information Genome and RefSeq databases, providing a valuable resource for future studies of DEVs in parasitoid wasps. 
    more » « less
  8. Abstract BackgroundThe small hive beetle (SHB), Aethina tumida, has emerged as a worldwide threat to honey bees in the past two decades. These beetles harvest nest resources, feed on larval bees, and ultimately spoil nest resources with gelatinous slime together with the fungal symbiont Kodamaea ohmeri. ResultsHere, we present the first chromosome-level genome assembly for the SHB. With a 99.1% representation of conserved (BUSCO) arthropod genes, this resource enables the study of chemosensory, digestive, and detoxification traits critical for SHB success and possible control. We use this annotated assembly to characterize features of SHB sex chromosomes and a female-skewed primary sex ratio. We also found chromosome fusion and a lower recombination rate in sex chromosomes than in autosomes. ConclusionsGenome-enabled insights will clarify the traits that allowed this beetle to exploit hive resources successfully and will be critical for determining the causes of observed sex ratio asymmetries. 
    more » « less
  9. null (Ed.)
    The phylum Arthropoda includes species crucial for ecosystem stability, soil health, crop production, and others that present obstacles to crop and animal agriculture. The United States Department of Agriculture’s Agricultural Research Service initiated the Ag100Pest Initiative to generate reference genome assemblies of arthropods that are (or may become) pests to agricultural production and global food security. We describe the project goals, process, status, and future. The first three years of the project were focused on species selection, specimen collection, and the construction of lab and bioinformatics pipelines for the efficient production of assemblies at scale. Contig-level assemblies of 47 species are presented, all of which were generated from single specimens. Lessons learned and optimizations leading to the current pipeline are discussed. The project name implies a target of 100 species, but the efficiencies gained during the project have supported an expansion of the original goal and a total of 158 species are currently in the pipeline. We anticipate that the processes described in the paper will help other arthropod research groups or other consortia considering genome assembly at scale. 
    more » « less