skip to main content

Title: High-throughput detection of T-DNA insertion sites for multiple transgenes in complex genomes
Abstract Background

Genetic engineering of crop plants has been successful in transferring traits into elite lines beyond what can be achieved with breeding techniques. Introduction of transgenes originating from other species has conferred resistance to biotic and abiotic stresses, increased efficiency, and modified developmental programs. The next challenge is now to combine multiple transgenes into elite varieties via gene stacking to combine traits. Generating stable homozygous lines with multiple transgenes requires selection of segregating generations which is time consuming and labor intensive, especially if the crop is polyploid. Insertion site effects and transgene copy number are important metrics for commercialization and trait efficiency.


We have developed a simple method to identify the sites of transgene insertions using T-DNA-specific primers and high-throughput sequencing that enables identification of multiple insertion sites in the T1generation of any crop transformed viaAgrobacterium. We present an example using the allohexaploid oil-seed plantCamelina sativato determine insertion site location of two transgenes.


This new methodology enables the early selection of desirable transgene location and copy number to generate homozygous lines within two generations.

; ; ;
Publication Date:
Journal Name:
BMC Genomics
Springer Science + Business Media
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Genetic transformation is a powerful means for the improvement of crop plants, but requires labor‐ and resource‐intensive methods. An efficient method for identifying single‐copy transgene insertion events from a population of independent transgenic lines is desirable. Currently, transgene copy number is estimated by either Southern blot hybridization analyses or quantitative polymerase chain reaction (qPCR) experiments. Southern hybridization is a convincing and reliable method, but it also is expensive, time‐consuming and often requires a large amount of genomicDNAand radioactively labeled probes. Alternatively,qPCRrequires lessDNAand is potentially simpler to perform, but its results can lack the accuracy and precision needed to confidently distinguish between one‐ and two‐copy events in transgenic plants with large genomes. To address this need, we developed a droplet digitalPCR‐based method for transgene copy number measurement in an array of crops: rice, citrus, potato, maize, tomato and wheat. The method utilizes specific primers to amplify target transgenes, and endogenous reference genes in a single duplexed reaction containing thousands of droplets. Endpoint amplicon production in the droplets is detected and quantified using sequence‐specific fluorescently labeled probes. The results demonstrate that this approach can generate confident copy number measurements in independent transgenic lines in these cropmore »species. This method and the compendium of probes and primers will be a useful resource for the plant research community, enabling the simple and accurate determination of transgene copy number in these six important crop species.

    « less
  2. Abstract Background

    Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations.


    Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants.


    Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibratedmore »Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse (

    « less
  3. Abstract

    The Chinese hamster ovary (CHO) cell lines that are used to produce commercial quantities of therapeutic proteins commonly exhibit a decrease in productivity over time in culture, a phenomenon termed production instability. Random integration of the transgenes encoding the protein of interest into locations in the CHO genome that are vulnerable to genetic and epigenetic instability often causes production instability through copy number loss and silencing of expression. Several recent publications have shown that these cell line development challenges can be overcome by using site‐specific integration (SSI) technology to insert the transgenes at genomic loci, often called “hotspots,” that are transcriptionally permissive and have enhanced stability relative to the rest of the genome. However, extensive characterization of the CHO epigenome is needed to identify hotspots that maintain their desirable epigenetic properties in an industrial bioprocess environment and maximize transcription from a single integrated transgene copy. To this end, the epigenomes and transcriptomes of two distantly related cell lines, an industrially relevant monoclonal antibody‐producing cell line and its parental CHO‐K1 host, were characterized using high throughput chromosome conformation capture and RNAseq to analyze changes in the epigenome that occur during cell line development and associated changes in system‐wide gene expression.more »In total, 10.9% of the CHO genome contained transcriptionally permissive three‐dimensional chromatin structures with enhanced genetic and epigenetic stability relative to the rest of the genome. These safe harbor regions also showed good agreement with published CHO epigenome data, demonstrating that this method was suitable for finding genomic regions with epigenetic markers of active and stable gene expression. These regions significantly reduce the genomic search space when looking for CHO hotspots with widespread applicability and can guide future studies with the goal of maximizing the potential of SSI technology in industrial production CHO cell lines.

    « less
  4. Abstract Aim

    Climate change regulates autumn leaf senescence date (LSD), exhibiting a strong phenological control of plant carbon uptake. Unlike the delaying effect of daily mean temperature (Tmean) on LSD, the impact of warming asymmetry in daytime and nighttime, as evidenced by variations of the diurnal temperature range (DTR), remains elusive. The objectives of this study were to investigate physiological and ecological impacts of DTR on LSD using long‐termin situobservations and to predict the future trends of LSD under warming.



    Time period


    Major taxa studied

    Plant phenology.


    We used partial correlation analysis, multiple linear regression and ridge regression to explore the impacts of DTR on LSD. To quantify the importance of potential drivers of LSD, we trained random forest models and applied the SHapley Additive exPlanations method to isolate the marginal contributions of each predictor on LSD. For LSD modelling and projection, we first evaluated two temperature‐driven LSD models [i.e., cooling‐degree‐day (CDD, without DTR effect) and day–night‐temperature CDD (DNCDD, with DTR effect)], then applied them to predict future LSDs.


    We found that observational increases inTmeanand DTR had contrasting effects on LSD. IncreasedTmeandelayed the LSD, whereas larger DTR overall had an advancing effect. Considering the DTR effect, theTmeansensitivity of LSD was 14%more »lower than presently estimated (2.4 vs. 2.8 days °C−1). Warming asymmetry‐related drought stress and plant functional traits (i.e., plant isohydricity and water‐use efficiency) potentially explained the advancing effect of DTR on LSD. We found that current projections of future LSD are overestimated because the DTR effect is discounted, suggesting the need for an adequate understanding of how plant phenology responds to warming asymmetry.

    Main conclusions

    Our findings highlight the importance of DTR in controlling LSD variations with an advancing‐dominant effect and call for the improvement of phenology modelling incorporating the DTR effect. Given that DTR showed a globally narrowing trend over the last several decades, more efforts are needed to understand the potential ecological impacts of warming asymmetry and vegetation response to climate change.

    « less
  5. The present study investigated the efficiency of CRISPR/Cas9 in creating genomic deletions as the basis of its application in removing selection marker genes or the intergenic regions. Three loci, representing a transgene and two rice genes, were targeted at two sites each, in separate experiments, and the deletion of the defined fragments was investigated by PCR and sequencing. Genomic deletions were found at a low rate among the transformed callus lines that could be isolated, cultured, and regenerated into plants harboring the deletion. However, randomly regenerated plants showed mixed genomic effects, and generally did not harbor heritable genomic deletions. To determine whether point mutations occurred at each targeted site, a total of 114 plants consisting of primary transgenic lines and their progeny were analyzed. Ninety-three plants showed targeting, 60 of which were targeted at both sites. The presence of point mutations at both sites was correlated with the guide RNA efficiency. In summary, genomic deletions through dual-targeting by the paired-guide RNAs were generally observed in callus, while de novo point mutations at one or both sites occurred at high rates in transgenic plants and their progeny, generating a variety of insertion–deletions or single-nucleotide variations. In this study, point mutations weremore »exceedingly favored over genomic deletions; therefore, for the recovery of plant lines harboring targeted deletions, identifying early transformed clones harboring the deletions, and isolating them for plant regeneration is recommended.« less