skip to main content


Title: No Evidence of Copy Number Variation in Acidic Mammalian Chitinase Genes (CHIA) in New World and Old World Monkeys
Copy number variation may be the most common form of structural genetic variation in the genome. Numerous studies have shown that gene copy number variation can correlate with phenotypic variation, where higher copy numbers correspond to increased expression of the protein and vice versa. Examples include some digestive enzyme genes, where variation in copy numbers and protein expression may be related to dietary differences. Increasing the expression of a digestive enzyme through higher gene copy numbers may thus be a potential mechanism for altering an organism’s digestive capabilities. I investigated copy number variation in genes coding for acidic mammalian chitinase, a chitinolytic digestive enzyme that may be used for the digestion of insect exoskeletons, in nonhuman primates with varying levels of insect consumption. I hypothesized that CHIA copy number correlates positively with level of insectivory, predicting higher copy numbers in more insectivorous primates. I assessed copy number variation with the QuantStudio 3D digital PCR platform, in a comparative sample of Old World and New World primate species (N = 10 species, one or two individuals each). Contrary to my prediction, no evidence of copy number variation was found and all species tested had two gene copies per diploid genome. These findings suggest that if acidic mammalian chitinase expression varies according to insect consumption in primates, it may be up- or downregulated through another mechanism.  more » « less
Award ID(s):
1650864
NSF-PAR ID:
10099914
Author(s) / Creator(s):
Date Published:
Journal Name:
International journal of primatology
ISSN:
1573-8604
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons. RATIONALE Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species is currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types. RESULTS To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of >400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated. CONCLUSION TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution. Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes. TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic] 
    more » « less
  2. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  3. Abstract

    Digestion is driven by digestive enzymes and digestive enzyme gene copy number can provide insights on the genomic underpinnings of dietary specialization. The “Adaptive Modulation Hypothesis” (AMH) proposes that digestive enzyme activity, which increases with increased gene copy number, should correlate with substrate quantity in the diet. To test the AMH and reveal some of the genetics of herbivory vs carnivory, we sequenced, assembled, and annotated the genome ofAnoplarchus purpurescens, a carnivorous prickleback fish in the family Stichaeidae, and compared the gene copy number for key digestive enzymes to that ofCebidichthys violaceus, a herbivorous fish from the same family. A highly contiguous genome assembly of high quality (N50 = 10.6 Mb) was produced forA. purpurescens, using combined long-read and short-read technology, with an estimated 33,842 protein-coding genes. The digestive enzymes that we examined include pancreatic α-amylase, carboxyl ester lipase, alanyl aminopeptidase, trypsin, and chymotrypsin.Anoplarchus purpurescenshad fewer copies of pancreatic α-amylase (carbohydrate digestion) thanC. violaceus(1 vs. 3 copies). Moreover, A. purpurescenshad one fewer copy of carboxyl ester lipase (plant lipid digestion) thanC. violaceus(4 vs. 5). We observed an expansion in copy number for several protein digestion genes inA. purpurescenscompared toC. violaceus, including trypsin (5 vs. 3) and total aminopeptidases (6 vs. 5). Collectively, these genomic differences coincide with measured digestive enzyme activities (phenotypes) in the two species and they support the AMH. Moreover, this genomic resource is now available to better understand fish biology and dietary specialization.

     
    more » « less
  4. Abstract

    Although cases of independent adaptation to the same dietary niche have been documented in mammalian ecology, the molecular correlates of such shifts are seldom known. Here, we used genomewide analyses of molecular evolution to examine two lineages of bats that, from an insectivorous ancestor, have both independently evolved obligate frugivory: the Old World family Pteropodidae and the neotropical subfamily Stenodermatinae. New genome assemblies from two neotropical fruit bats (Artibeus jamaicensisandSturnira hondurensis) provide a framework for comparisons with Old World fruit bats. Comparative genomics of 10 bat species encompassing dietary diversity across the phylogeny revealed convergent molecular signatures of frugivory in both multigene family evolution and single‐copy genes. Evidence for convergent molecular adaptations associated with frugivorous diets includes the composition of three subfamilies of olfactory receptor genes, losses of three bitter taste receptor genes, losses of two digestive enzyme genes and convergent amino acid substitutions in several metabolic genes. By identifying suites of adaptations associated with the convergent evolution of frugivory, our analyses both reveal the extent of molecular mechanisms under selection in dietary shifts and will facilitate future studies of molecular ecology in mammals.

     
    more » « less
  5. Many mammals can digest starch by using an enzyme called amylase, but different species eat different amounts of starchy foods. Amylase is released by the pancreas, and in certain species such as humans, it is also created by the glands that produce saliva, allowing the enzyme to be present in the mouth. There, amylase can start to break down starch, releasing a sweet taste that helps the animal to detect starchy foods. Curiously, humans have multiple copies of the gene that codes for the enzyme, but the exact number varies between people. Previous research has found that populations with more copies also eat more starch; if this correlation also existed in other species, it could help to understand how diets influence and shape genetic information. In addition, it is unclear how amylase came to be present in saliva, as the ancestors of mammals only produced the protein in the pancreas. Pajic et al. analyzed the genomes of a range of mammals and found that the more starch a species had in its diet, the more amylase gene copies it harbored in its genome. In fact, unrelated mammals living in different habitats and eating different types of food have similar numbers of amylase gene copies if they have the same level of starch in their diet. In addition, Pajic et al. discovered that animals such as mice, rats, pigs and dogs, which have lived in close contact with people for thousands of years, quickly adapted to the large amount of starch present in human food. In each of these species, a mechanism called gene duplication independently created new copies of the amylase gene. This could represent the first step towards some of these copies becoming active in the glands that release saliva. In people, having fewer copies of the amylase gene could mean they have a higher risk for diabetes; this number is also tied to the composition of the collection of bacteria that live in the mouth and the gut. Understanding how the copy number of the amylase gene affects biology will help to grasp how it also affects health and wellbeing, in humans and in our four-legged companions. 
    more » « less