The effects of amino acid insertions and deletions (InDels) remain a rather under-explored area of structural biology. These variations oftentimes are the cause of numerous disease phenotypes. In spite of this, research to study InDels and their structural significance remains limited, primarily due to a lack of experimental information and computational methods. In this work, we fill this gap by modeling InDels computationally; we investigate the rigidity differences between the wildtype and a mutant variant with one or more InDels. Further, we compare how structural effects due to InDels differ from the effects of amino acid substitutions, which are another type of amino acid mutation. We finish by performing a correlation analysis between our rigidity-based metrics and wet lab data for their ability to infer the effects of InDels on protein fitness.
more »
« less
DIMPLE: deep insertion, deletion, and missense mutation libraries for exploring protein variation in evolution, disease, and biology
Abstract Insertions and deletions (indels) enable evolution and cause disease. Due to technical challenges, indels are left out of most mutational scans, limiting our understanding of them in disease, biology, and evolution. We develop a low cost and bias method, DIMPLE, for systematically generating deletions, insertions, and missense mutations in genes, which we test on a range of targets, including Kir2.1. We use DIMPLE to study how indels impact potassium channel structure, disease, and evolution. We find deletions are most disruptive overall, beta sheets are most sensitive to indels, and flexible loops are sensitive to deletions yet tolerate insertions.
more »
« less
- Award ID(s):
- 1231306
- PAR ID:
- 10587164
- Publisher / Repository:
- Genome Biology
- Date Published:
- Journal Name:
- Genome Biology
- Volume:
- 24
- Issue:
- 1
- ISSN:
- 1474-760X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract In this work, we extend vcfdist to be the first variant call benchmarking tool to jointly evaluate phased single-nucleotide polymorphisms (SNPs), small insertions/deletions (INDELs), and structural variants (SVs) for the whole genome. First, we find that a joint evaluation of small and structural variants uniformly reduces measured errors for SNPs (− 28.9%), INDELs (− 19.3%), and SVs (− 52.4%) across three datasets. vcfdist also corrects a common flaw in phasing evaluations, reducing measured flip errors by over 50%. Lastly, we show that vcfdist is more accurate than previously published works and on par with the newest approaches while providing improved result interpretability.more » « less
-
Abstract Insertions and deletions (Indels) represent one of the major variation types in the human genome and have been implicated in diseases including cancer. To study the features of somatic indels in different cancer genomes, we investigated the indels from two large samples of cancer types: invasive breast carcinoma (BRCA) and lung adenocarcinoma (LUAD). Besides mapping somatic indels in both coding and untranslated regions (UTRs) from the cancer whole exome sequences, we investigated the overlap between these indels and transcription factor binding sites (TFBSs), the key elements for regulation of gene expression that have been found in both coding and non-coding sequences. Compared to the germline indels in healthy genomes, somatic indels contain more coding indels with higher than expected frame-shift (FS) indels in cancer genomes. LUAD has a higher ratio of deletions and higher coding and FS indel rates than BRCA. More importantly, these somatic indels in cancer genomes tend to locate in sequences with important functions, which can affect the core secondary structures of proteins and have a bigger overlap with predicted TFBSs in coding regions than the germline indels. The somatic CDS indels are also enriched in highly conserved nucleotides when compared with germline CDS indels.more » « less
-
Despite insertions and deletions being the most common structural variants (SVs) found across genomes, not much is known about how much these SVs vary within populations and between closely related species, nor their significance in evolution. To address these questions, we characterized the evolution of indel SVs using genome assemblies of three closely related Heliconius butterfly species. Over the relatively short evolutionary timescales investigated, up to 18.0% of the genome was composed of indels between two haplotypes of an individual Heliconius charithonia butterfly and up to 62.7% included lineage-specific SVs between the genomes of the most distant species (11 Mya). Lineage-specific sequences were mostly characterized as transposable elements (TEs) inserted at random throughout the genome and their overall distribution was similarly affected by linked selection as single nucleotide substitutions. Using chromatin accessibility profiles (i.e., ATAC-seq) of head tissue in caterpillars to identify sequences with potential cis -regulatory function, we found that out of the 31,066 identified differences in chromatin accessibility between species, 30.4% were within lineage-specific SVs and 9.4% were characterized as TE insertions. These TE insertions were localized closer to gene transcription start sites than expected at random and were enriched for sites with significant resemblance to several transcription factor binding sites with known function in neuron development in Drosophila . We also identified 24 TE insertions with head-specific chromatin accessibility. Our results show high rates of structural genome evolution that were previously overlooked in comparative genomic studies and suggest a high potential for structural variation to serve as raw material for adaptive evolution.more » « less
-
Despite insertions and deletions being the most common structural variants (SVs) found across genomes, not much is known about how much these SVs vary within populations and between closely related species, nor their significance in evolution. To address these questions, we characterized the evolution of indel SVs using genome assemblies of three closely related Heliconius butterfly species. Over the relatively short evolutionary timescales investigated, up to 18.0% of the genome was composed of indels between two haplotypes of an individual H. charithonia butterfly and up to 62.7% included lineage-specific SVs between the genomes of the most distant species (11 Mya). Lineage-specific sequences were mostly characterized as transposable elements (TEs) inserted at random throughout the genome and their overall distribution was similarly affected by linked selection as single nucleotide substitutions. Using chromatin accessibility profiles (i.e., ATAC-seq) of head tissue in caterpillars to identify sequences with potential cis-regulatory function, we found that out of the 31,066 identified differences in chromatin accessibility between species, 30.4% were within lineage-specific SVs and 9.4% were characterized as TE insertions. These TE insertions were localized closer to gene transcription start sites than expected at random and were enriched for several transcription factor binding site candidates with known function in neuron development in Drosophila. We also identified 24 TE insertions with head-specific chromatin accessibility. Our results show high rates of structural genome evolution that were previously overlooked in comparative genomic studies and suggest a high potential for structural variation to serve as raw material for adaptive evolution.more » « less
An official website of the United States government

