skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The fidelity of transcription in human cells
To determine the error rate of transcription in human cells, we analyzed the transcriptome of H1 human embryonic stem cells with a circle-sequencing approach that allows for high-fidelity sequencing of the transcriptome. These experiments identified approximately 100,000 errors distributed over every major RNA species in human cells. Our results indicate that different RNA species display different error rates, suggesting that human cells prioritize the fidelity of some RNAs over others. Cross-referencing the errors that we detected with various genetic and epigenetic features of the human genome revealed that the in vivo error rate in human cells changes along the length of a transcript and is further modified by genetic context, repetitive elements, epigenetic markers, and the speed of transcription. Our experiments further suggest that BRCA1, a DNA repair protein implicated in breast cancer, has a previously unknown role in the suppression of transcription errors. Finally, we analyzed the distribution of transcription errors in multiple tissues of a new mouse model and found that they occur preferentially in neurons, compared to other cell types. These observations lend additional weight to the idea that transcription errors play a key role in the progression of various neurological disorders, including Alzheimer’s disease.  more » « less
Award ID(s):
2119963
PAR ID:
10418029
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; « less
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
120
Issue:
5
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Agashe, Deepa (Ed.)
    Abstract Because errors at the DNA level power pathogen evolution, a systematic understanding of the rate and molecular spectra of mutations could guide the avoidance and treatment of infectious diseases. We thus accumulated tens of thousands of spontaneous mutations in 768 repeatedly bottlenecked lineages of 18 strains from various geographical sites, temporal spread, and genetic backgrounds. Entailing over ∼1.36 million generations, the resultant data yield an average mutation rate of ∼0.0005 per genome per generation, with a significant within-species variation. This is one of the lowest bacterial mutation rates reported, giving direct support for a high genome stability in this pathogen resulting from high DNA-mismatch-repair efficiency and replication-machinery fidelity. Pathogenicity genes do not exhibit an accelerated mutation rate, and thus, elevated mutation rates may not be the major determinant for the diversification of toxin and secretion systems. Intriguingly, a low error rate at the transcript level is not observed, suggesting distinct fidelity of the replication and transcription machinery. This study urges more attention on the most basic evolutionary processes of even the best-known human pathogens and deepens the understanding of their genome evolution. 
    more » « less
  2. Abstract Background Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. Results We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts—twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. Conclusions AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species. 
    more » « less
  3. Background Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. Results We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts—twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. Conclusions AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species. 
    more » « less
  4. Abstract Yeasts are naturally diverse, genetically tractable, and easy to grow such that researchers can investigate any number of genotypes, environments, or interactions thereof. However, studies of yeast transcriptomes have been limited by the processing capabilities of traditional RNA sequencing techniques. Here we optimize a powerful, high‐throughput single‐cell RNA sequencing (scRNAseq) platform, SPLiT‐seq (Split Pool Ligation‐based Transcriptome sequencing), for yeasts and apply it to 43,388 cells of multiple species and ploidies. This platform utilizes a combinatorial barcoding strategy to enable massively parallel RNA sequencing of hundreds of yeast genotypes or growth conditions at once. This method can be applied to most species or strains of yeast for a fraction of the cost of traditional scRNAseq approaches. Thus, our technology permits researchers to leverage “the awesome power of yeast” by allowing us to survey the transcriptome of hundreds of strains and environments in a short period of time and with no specialized equipment. The key to this method is that sequential barcodes are probabilistically appended to cDNA copies of RNA while the molecules remain trapped inside of each cell. Thus, the transcriptome of each cell is labeled with a unique combination of barcodes. Since SPLiT‐seq uses the cell membrane as a container for this reaction, many cells can be processed together without the need to physically isolate them from one another in separate wells or droplets. Further, the first barcode in the sequence can be chosen intentionally to identify samples from different environments or genetic backgrounds, enabling multiplexing of hundreds of unique perturbations in a single experiment. In addition to greater multiplexing capabilities, our method also facilitates a deeper investigation of biological heterogeneity, given its single‐cell nature. For example, in the data presented here, we detect transcriptionally distinct cell states related to cell cycle, ploidy, metabolic strategies, and so forth, all within clonal yeast populations grown in the same environment. Hence, our technology has two obvious and impactful applications for yeast research: the first is the general study of transcriptional phenotypes across many strains and environments, and the second is investigating cell‐to‐cell heterogeneity across the entire transcriptome. 
    more » « less
  5. Knowledge of locations and activities ofcis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state regulatory potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbor distinctive transcription factor binding motifs that are similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we show that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment. 
    more » « less