skip to main content


Title: Computational chemistry methods to investigate the effects caused by DNA variants linked with disease
• Most frequently occurring human DNA variants are single nucleotide variants resulting in a change of DNA base. • Their effect on the wild type characteristics of the corresponding protein can be predicted by the means of computational chemistry. • Dysfunctional protein may have predominant effect on a particular organ in the human body.  more » « less
Award ID(s):
1725573
NSF-PAR ID:
10201297
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Journal of Theoretical and Computational Chemistry
Volume:
19
Issue:
06
ISSN:
0219-6336
Page Range / eLocation ID:
1930001
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Replication Protein A (RPA) is single-strand DNA binding protein that plays a key role in the replication and repair of DNA. RPA is a heterotrimer made of 3 subunits – RPA1, RPA2, and RPA3. Germline pathogenic variants affectingRPA1were recently described in patients with Telomere Biology Disorders (TBD), also known as dyskeratosis congenita or short telomere syndrome. Premature telomere shortening is a hallmark of TBD and results in bone marrow failure and predisposition to hematologic malignancies. Building on the finding that somatic mutations in RPA subunit genes occur in ~1% of cancers, we hypothesized that germline RPA alterations might be enriched in human cancers. Because germlineRPA1mutations are linked to early onset TBD with predisposition to myelodysplastic syndromes, we interrogated pediatric cancer cohorts to define the prevalence and spectrum of rare/novel and putative damaging germlineRPA1,RPA2, andRPA3variants. In this study of 5,993 children with cancer, 75 (1.25%) harbored heterozygous rare (non-cancer population allele frequency (AF) < 0.1%) variants in the RPA heterotrimer genes, of which 51 cases (0.85%) had ultra-rare (AF < 0.005%) or novel variants. Compared with Genome Aggregation Database (gnomAD) non-cancer controls, there was significant enrichment of ultra-rare and novelRPA1, but notRPA2orRPA3, germline variants in our cohort (adjusted p-value < 0.05). Taken together, these findings suggest that germline putative damaging variants affectingRPA1are found in excess in children with cancer, warranting further investigation into the functional role of these variants in oncogenesis.

     
    more » « less
  2. INTRODUCTION To faithfully distribute genetic material to daughter cells during cell division, spindle fibers must couple to DNA by means of a structure called the kinetochore, which assembles at each chromosome’s centromere. Human centromeres are located within large arrays of tandemly repeated DNA sequences known as alpha satellite (αSat), which often span millions of base pairs on each chromosome. Arrays of αSat are frequently surrounded by other types of tandem satellite repeats, which have poorly understood functions, along with nonrepetitive sequences, including transcribed genes. Previous genome sequencing efforts have been unable to generate complete assemblies of satellite-rich regions because of their scale and repetitive nature, limiting the ability to study their organization, variation, and function. RATIONALE Pericentromeric and centromeric (peri/centromeric) satellite DNA sequences have remained almost entirely missing from the assembled human reference genome for the past 20 years. Using a complete, telomere-to-telomere (T2T) assembly of a human genome, we developed and deployed tailored computational approaches to reveal the organization and evolutionary patterns of these satellite arrays at both large and small length scales. We also performed experiments to map precisely which αSat repeats interact with kinetochore proteins. Last, we compared peri/centromeric regions among multiple individuals to understand how these sequences vary across diverse genetic backgrounds. RESULTS Satellite repeats constitute 6.2% of the T2T-CHM13 genome assembly, with αSat representing the single largest component (2.8% of the genome). By studying the sequence relationships of αSat repeats in detail across each centromere, we found genome-wide evidence that human centromeres evolve through “layered expansions.” Specifically, distinct repetitive variants arise within each centromeric region and expand through mechanisms that resemble successive tandem duplications, whereas older flanking sequences shrink and diverge over time. We also revealed that the most recently expanded repeats within each αSat array are more likely to interact with the inner kinetochore protein Centromere Protein A (CENP-A), which coincides with regions of reduced CpG methylation. This suggests a strong relationship between local satellite repeat expansion, kinetochore positioning, and DNA hypomethylation. Furthermore, we uncovered large and unexpected structural rearrangements that affect multiple satellite repeat types, including active centromeric αSat arrays. Last, by comparing sequence information from nearly 1600 individuals’ X chromosomes, we observed that individuals with recent African ancestry possess the greatest genetic diversity in the region surrounding the centromere, which sometimes contains a predominantly African αSat sequence variant. CONCLUSION The genetic and epigenetic properties of centromeres are closely interwoven through evolution. These findings raise important questions about the specific molecular mechanisms responsible for the relationship between inner kinetochore proteins, DNA hypomethylation, and layered αSat expansions. Even more questions remain about the function and evolution of non-αSat repeats. To begin answering these questions, we have produced a comprehensive encyclopedia of peri/centromeric sequences in a human genome, and we demonstrated how these regions can be studied with modern genomic tools. Our work also illuminates the rich genetic variation hidden within these formerly missing regions of the genome, which may contribute to health and disease. This unexplored variation underlines the need for more T2T human genome assemblies from genetically diverse individuals. Gapless assemblies illuminate centromere evolution. ( Top ) The organization of peri/centromeric satellite repeats. ( Bottom left ) A schematic portraying (i) evidence for centromere evolution through layered expansions and (ii) the localization of inner-kinetochore proteins in the youngest, most recently expanded repeats, which coincide with a region of DNA hypomethylation. ( Bottom right ) An illustration of the global distribution of chrX centromere haplotypes, showing increased diversity in populations with recent African ancestry. 
    more » « less
  3. DNA damage and repair are widely studied in relation to cancer and therapeutics. Y-family DNA polymerases can bypass DNA lesions, which may result from external or internal DNA damaging agents, including some chemotherapy agents. Overexpression of Y-family polymerase human pol kappa can result in tumorigenesis and drug resistance in cancer. This report describes the use of computational tools to predict the effects of single nucleotide polymorphism variants on pol kappa activity. Partial Order Optimum Likelihood (POOL), a machine learning method that uses input features from Theoretical Microscopic Titration Curve Shapes (THEMATICS), was used to identify amino acid residues most likely involved in catalytic activity. The μ4 value, a metric obtained from POOL and THEMATICS that serves as a measure of the degree of coupling between one ionizable amino acid and its neighbors, was then used to identify which protein mutations are likely to impact the biochemical activity. Bioinformatic tools SIFT, PolyPhen-2, and FATHMM predicted most of these variants to be deleterious to function. Along with computational and bioinformatic predictions, we characterized the catalytic activity and stability of seventeen cancer-associated DNA pol kappa variants. We identified pol kappa variants R48I, H105Y, G147D, G154E, V177L, R298C, E362V, and R470C as having lower activity relative to wild-type pol kappa; the pol kappa variants T102A, H142Y, R175Q, E210K, Y221C, N330D, N338S, K353T, and L383F were identified as similar in catalytic efficiency to WT pol kappa. We observed that POOL predictions can be used to predict which variants have decreased activity. Predictions from bioinformatic tools like SIFT, PolyPhen-2 and FATHMM are based on sequence comparisons and therefore are complementary to POOL but are less capable of predicting biochemical activity. These bioinformatic and computational tools can be used to identify SNP variants with deleterious effects and altered biochemical activity from a large data set. 
    more » « less
  4. Abstract The human angiotensin-converting enzyme 2 (ACE2) and transmembrane serine protease 2 (TMPRSS2) proteins play key roles in the cellular internalization of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the coronavirus responsible for the coronavirus disease of 2019 (COVID-19) pandemic. We set out to functionally characterize the ACE2 and TMPRSS2 protein abundance for variant alleles encoding these proteins that contained non-synonymous single-nucleotide polymorphisms (nsSNPs) in their open reading frames (ORFs). Specifically, a high-throughput assay, deep mutational scanning (DMS), was employed to test the functional implications of nsSNPs, which are variants of uncertain significance in these two genes. Specifically, we used a ‘landing pad’ system designed to quantify the protein expression for 433 nsSNPs that have been observed in the ACE2 and TMPRSS2 ORFs and found that 8 of 127 ACE2, 19 of 157 TMPRSS2 isoform 1 and 13 of 149 TMPRSS2 isoform 2 variant proteins displayed less than ~25% of the wild-type protein expression, whereas 4 ACE2 variants displayed 25% or greater increases in protein expression. As a result, we concluded that nsSNPs in genes encoding ACE2 and TMPRSS2 might potentially influence SARS-CoV-2 infectivity. These results can now be applied to DNA sequence data for patients infected with SARS-CoV-2 to determine the possible impact of patient-based DNA sequence variation on the clinical course of SARS-CoV-2 infection. 
    more » « less
  5. Abstract Background

    The paucity of SARS-CoV-2-specific virulence factors has greatly hampered the therapeutic management of patients with COVID-19 disease. Although available vaccines and approved therapies have shown tremendous benefits, the continuous emergence of new variants of SARS-CoV-2 and side effects of existing treatments continue to challenge therapy, necessitating the development of a novel effective therapy. We have previously shown that our developed novel single-stranded DNA aptamers not only target the trimer S protein of SARS-CoV-2, but also block the interaction between ACE2 receptors and trimer S protein of Wuhan origin, Delta, Delta plus, Alpha, Lambda, Mu, and Omicron variants of SARS-CoV-2. We herein performed in vivo experiments that administer the aptamer to the lungs by intubation as well as in vitro studies utilizing PBMCs to prove the efficacy and safety of our most effective aptamer, AYA2012004_L.

    Methods

    In vivo studies were conducted in transgenic mice expressing human ACE2 (K18hACE2), C57BL/6J, and Balb/cJ. Flow cytometry was used to check S-protein expressing pseudo-virus-like particles (VLP) uptake by the lung cells and test the immuogenicity of AYA2012004_L. Ames test was used to assess mutagenicity of AYA2012004_L. RT-PCR and histopathology were used to determine the biodistribution and toxicity of AYA2012004_L in vital organs of mice.

    Results

    We measured the in vivo uptake of VLPs by lung cells by detecting GFP signal using flow cytometry. AYA2012004_L specifically neutralized VLP uptake and also showed no inflammatory response in mice lungs. In addition, AYA2012004_L did not induce inflammatory response in the lungs of Th1 and Th2 mouse models as well as human PBMCs. AYA2012004_L was detectable in mice lungs and noticeable in insignificant amounts in other vital organs. Accumulation of AYA2012004_L in organs decreased over time. AYA2012004_L did not induce degenerative signs in tissues as seen by histopathology and did not cause changes in the body weight of mice. Ames test also certified that AYA2012004_L is non-mutagenic and proved it to be safe for in vivo studies.

    Conclusions

    Our aptamer is safe, effective, and can neutralize the uptake of VLPs by lung cells when administered locally suggesting that it can be used as a potential therapeutic agent for COVID-19 management.

     
    more » « less