DNA damage and repair are widely studied in relation to cancer and therapeutics. Y-family DNA polymerases can bypass DNA lesions, which may result from external or internal DNA damaging agents, including some chemotherapy agents. Overexpression of Y-family polymerase human pol kappa can result in tumorigenesis and drug resistance in cancer. This report describes the use of computational tools to predict the effects of single nucleotide polymorphism variants on pol kappa activity. Partial Order Optimum Likelihood (POOL), a machine learning method that uses input features from Theoretical Microscopic Titration Curve Shapes (THEMATICS), was used to identify amino acid residues most likely involved in catalytic activity. The μ4 value, a metric obtained from POOL and THEMATICS that serves as a measure of the degree of coupling between one ionizable amino acid and its neighbors, was then used to identify which protein mutations are likely to impact the biochemical activity. Bioinformatic tools SIFT, PolyPhen-2, and FATHMM predicted most of these variants to be deleterious to function. Along with computational and bioinformatic predictions, we characterized the catalytic activity and stability of seventeen cancer-associated DNA pol kappa variants. We identified pol kappa variants R48I, H105Y, G147D, G154E, V177L, R298C, E362V, and R470C as having lower activity relative to wild-type pol kappa; the pol kappa variants T102A, H142Y, R175Q, E210K, Y221C, N330D, N338S, K353T, and L383F were identified as similar in catalytic efficiency to WT pol kappa. We observed that POOL predictions can be used to predict which variants have decreased activity. Predictions from bioinformatic tools like SIFT, PolyPhen-2 and FATHMM are based on sequence comparisons and therefore are complementary to POOL but are less capable of predicting biochemical activity. These bioinformatic and computational tools can be used to identify SNP variants with deleterious effects and altered biochemical activity from a large data set.
more »
« less
Computational chemistry methods to investigate the effects caused by DNA variants linked with disease
• Most frequently occurring human DNA variants are single nucleotide variants resulting in a change of DNA base. • Their effect on the wild type characteristics of the corresponding protein can be predicted by the means of computational chemistry. • Dysfunctional protein may have predominant effect on a particular organ in the human body.
more »
« less
- Award ID(s):
- 1725573
- PAR ID:
- 10201297
- Date Published:
- Journal Name:
- Journal of Theoretical and Computational Chemistry
- Volume:
- 19
- Issue:
- 06
- ISSN:
- 0219-6336
- Page Range / eLocation ID:
- 1930001
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Replication Protein A (RPA) is single-strand DNA binding protein that plays a key role in the replication and repair of DNA. RPA is a heterotrimer made of 3 subunits – RPA1, RPA2, and RPA3. Germline pathogenic variants affectingRPA1were recently described in patients with Telomere Biology Disorders (TBD), also known as dyskeratosis congenita or short telomere syndrome. Premature telomere shortening is a hallmark of TBD and results in bone marrow failure and predisposition to hematologic malignancies. Building on the finding that somatic mutations in RPA subunit genes occur in ~1% of cancers, we hypothesized that germline RPA alterations might be enriched in human cancers. Because germlineRPA1mutations are linked to early onset TBD with predisposition to myelodysplastic syndromes, we interrogated pediatric cancer cohorts to define the prevalence and spectrum of rare/novel and putative damaging germlineRPA1,RPA2, andRPA3variants. In this study of 5,993 children with cancer, 75 (1.25%) harbored heterozygous rare (non-cancer population allele frequency (AF) < 0.1%) variants in the RPA heterotrimer genes, of which 51 cases (0.85%) had ultra-rare (AF < 0.005%) or novel variants. Compared with Genome Aggregation Database (gnomAD) non-cancer controls, there was significant enrichment of ultra-rare and novelRPA1, but notRPA2orRPA3, germline variants in our cohort (adjusted p-value < 0.05). Taken together, these findings suggest that germline putative damaging variants affectingRPA1are found in excess in children with cancer, warranting further investigation into the functional role of these variants in oncogenesis.more » « less
-
Abstract The human angiotensin-converting enzyme 2 (ACE2) and transmembrane serine protease 2 (TMPRSS2) proteins play key roles in the cellular internalization of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the coronavirus responsible for the coronavirus disease of 2019 (COVID-19) pandemic. We set out to functionally characterize the ACE2 and TMPRSS2 protein abundance for variant alleles encoding these proteins that contained non-synonymous single-nucleotide polymorphisms (nsSNPs) in their open reading frames (ORFs). Specifically, a high-throughput assay, deep mutational scanning (DMS), was employed to test the functional implications of nsSNPs, which are variants of uncertain significance in these two genes. Specifically, we used a ‘landing pad’ system designed to quantify the protein expression for 433 nsSNPs that have been observed in the ACE2 and TMPRSS2 ORFs and found that 8 of 127 ACE2, 19 of 157 TMPRSS2 isoform 1 and 13 of 149 TMPRSS2 isoform 2 variant proteins displayed less than ~25% of the wild-type protein expression, whereas 4 ACE2 variants displayed 25% or greater increases in protein expression. As a result, we concluded that nsSNPs in genes encoding ACE2 and TMPRSS2 might potentially influence SARS-CoV-2 infectivity. These results can now be applied to DNA sequence data for patients infected with SARS-CoV-2 to determine the possible impact of patient-based DNA sequence variation on the clinical course of SARS-CoV-2 infection.more » « less
-
A universal aptamer against spike-proteins of diverse SARS-CoV-2 variants was discovered via DNA SELEX towards the wild-type (WT) spike-protein. This aptamer, A1C1, binds to the WT spike-protein or other variants of concern such as Delta and Omicron with low nanomolar affinities. A1C1 inhibited the interaction between hACE2 and various spike-proteins by 85–89%. This universal A1C1 aptamer can be used to design diagnostic and therapeutic molecular tools to target SARS-CoV-2 and its variants.more » « less
-
Abstract We used long-read DNA sequencing to assemble the genome of a Southern Han Chinese male. We organized the sequence into chromosomes and filled in gaps using the recently completed T2T-CHM13 genome as a guide, yielding a gap-free genome, Han1, containing 3,099,707,698 bases. Using the T2T-CHM13 annotation as a reference, we mapped all genes onto the Han1 genome and identified additional gene copies, generating a total of 60,708 putative genes, of which 20,003 are protein-coding. A comprehensive comparison between the genes revealed that 235 protein-coding genes were substantially different between the individuals, with frameshifts or truncations affecting the protein-coding sequence. Most of these were heterozygous variants in which one gene copy was unaffected. This represents the first gene-level comparison between two finished, annotated individual human genomes.more » « less
An official website of the United States government

