skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: SkewIT: The Skew Index Test for large-scale GC Skew analysis of bacterial genomes
GC skew is a phenomenon observed in many bacterial genomes, wherein the two replication strands of the same chromosome contain different proportions of guanine and cytosine nucleotides. Here we demonstrate that this phenomenon, which was first discovered in the mid-1990s, can be used today as an analysis tool for the 15,000+ complete bacterial genomes in NCBI’s Refseq library. In order to analyze all 15,000+ genomes, we introduce a new method, SkewIT (Skew Index Test), that calculates a single metric representing the degree of GC skew for a genome. Using this metric, we demonstrate how GC skew patterns are conserved within certain bacterial phyla, e.g. Firmicutes, but show different patterns in other phylogenetic groups such as Actinobacteria. We also discovered that outlier values of SkewIT highlight potential bacterial mis-assemblies. Using our newly defined metric, we identify multiple mis-assembled chromosomal sequences in previously published complete bacterial genomes. We provide a SkewIT web app https://jenniferlu717.shinyapps.io/SkewIT/ that calculates SkewI for any user-provided bacterial sequence. The web app also provides an interactive interface for the data generated in this paper, allowing users to further investigate the SkewI values and thresholds of the Refseq-97 complete bacterial genomes. Individual scripts for analysis of bacterial genomes are provided in the following repository: https://github.com/jenniferlu717/SkewIT .  more » « less
Award ID(s):
1744309
PAR ID:
10308624
Author(s) / Creator(s):
;
Editor(s):
Rzhetsky, Andrey
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
16
Issue:
12
ISSN:
1553-7358
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Saitou, Naruya (Ed.)
    Abstract We present the Codon Statistics Database, an online database that contains codon usage statistics for all the species with reference or representative genomes in RefSeq (over 15,000). The user can search for any species and access two sets of tables. One set lists, for each codon, the frequency, the Relative Synonymous Codon Usage, and whether the codon is preferred. Another set of tables lists, for each gene, its GC content, Effective Number of Codons, Codon Adaptation Index, and frequency of optimal codons. Equivalent tables can be accessed for (1) all nuclear genes, (2) nuclear genes encoding ribosomal proteins, (3) mitochondrial genes, and (4) chloroplast genes (if available in the relevant assembly). The user can also search for any taxonomic group (e.g., “primates”) and obtain a table comparing all the species in the group. The database is free to access without registration at http://codonstatsdb.unr.edu. 
    more » « less
  2. Abstract In this paper, we give Pieri rules for skew dual immaculate functions and their recently discovered row-strict counterparts. We establish our rules using a right-action analogue of the skew Littlewood–Richardson rule for Hopf algebras of Lam–Lauve–Sottile. We also obtain Pieri rules for row-strict (dual) immaculate functions. 
    more » « less
  3. Females of many species are polyandrous. However, polyandry can give rise to conflict among individuals within families. We examined the level of polyandry and paternity skew in the common eastern yellowjacket wasp,Vespula maculifrons, in order to gain a greater understanding of conflict in social insects. We collected 10 colonies ofV. maculifronsand genotyped workers and prereproductive queens at highly variable microsatellite markers to assign each to a patriline. Genotypic data revealed evidence of significant paternity skew among patrilines. In addition, we found that patrilines contributed differentially to caste production (worker vs. queen), suggesting an important role for reproductive conflict not previously discovered. We also investigated if patterns of paternity skew and mate number varied over time. However, we found no evidence of changes in levels of polyandry when compared to historical data dating back almost 40 years. Finally, we measured a suite of morphological traits in individuals from the most common and least common patrilines in each colony to test if males that showed highly skewed reproductive success also produced offspring that differed in phenotype. Our data revealed weak correlation between paternity skew and morphological phenotype of offspring sired by different males, suggesting no evidence of evolutionary tradeoffs at the level investigated. Overall, this study is the first to report significant paternity and caste‐associated skew inV. maculifrons, and to investigate the phenotypic consequences of skew in a social wasp. Our results suggest that polyandry can have important consequences on the genetic and social structure of insect societies. 
    more » « less
  4. Abstract Motivation: The study of bacterial genome dynamics is vital for understanding the mechanisms underlying microbial adaptation, growth, and their impact on host phenotype. Structural variants (SVs), genomic alterations of 50 base pairs or more, play a pivotal role in driving evolutionary processes and maintaining genomic heterogeneity within bacterial populations. While SV detection in isolate genomes is relatively straightforward, metagenomes present broader challenges due to the absence of clear reference genomes and the presence of mixed strains. In response, our proposed method rhea, forgoes reference genomes and metagenome-assembled genomes (MAGs) by encompassing all metagenomic samples in a series (time or other metric) into a single co-assembly graph. The log fold change in graph coverage between successive samples is then calculated to call SVs that are thriving or declining. Results: We show rhea to outperform existing methods for SV and horizontal gene transfer (HGT) detection in two simulated mock metagenomes, particularly as the simulated reads diverge from reference genomes and an increase in strain diversity is incorporated. We additionally demonstrate use cases for rhea on series metagenomic data of environmental and fermented food microbiomes to detect specific sequence alterations between successive time and temperature samples, suggesting host advantage. Our approach leverages previous work in assembly graph structural and coverage patterns to provide versatility in studying SVs across diverse and poorly characterized microbial communities for more comprehensive insights into microbial gene flux. Availability and implementation: rhea is open source and available at: https://github.com/treangenlab/rhea. 
    more » « less
  5. Abstract More accurate and more complete predictions of cis-regulatory modules (CRMs) and constituent transcription factor (TF) binding sites (TFBSs) in genomes can facilitate characterizing functions of regulatory sequences. Here, we developed a database predicted cis-regulatory modules (PCRMS) (https://cci-bioinfo.uncc.edu) that stores highly accurate and unprecedentedly complete maps of predicted CRMs and TFBSs in the human and mouse genomes. The web interface allows the user to browse CRMs and TFBSs in an organism, find the closest CRMs to a gene, search CRMs around a gene and find all TFBSs of a TF. PCRMS can be a useful resource for the research community to characterize regulatory genomes. Database URL: https://cci-bioinfo.uncc.edu/ 
    more » « less