PremiseTagSeq is a cost‐effective approach for gene expression studies requiring a large number of samples. To date, TagSeq studies in plants have been limited to those with a high‐quality reference genome. We tested the suitability of reference transcriptomes for TagSeq in non‐model plants, as part of a study of natural gene expression variation at the Santa Rita Experimental Range National Ecological Observatory Network (NEON) core site. MethodsTissue for TagSeq was sampled from multiple individuals of four species (Bouteloua aristidoidesandEragrostis lehmanniana[Poaceae],Tidestromia lanuginosa[Amaranthaceae], andParkinsonia florida[Fabaceae]) at two locations on three dates (56 samples total). One sample per species was used to create a reference transcriptome via standard RNA‐seq. TagSeq performance was assessed by recovery of reference loci, specificity of tag alignments, and variation among samples. ResultsA high fraction of tags aligned to each reference and mapped uniquely. Expression patterns were quantifiable for tens of thousands of loci, which revealed consistent spatial differentiation in expression for all species. DiscussionTagSeq using de novo reference transcriptomes was an effective approach to quantifying gene expression in this study. Tags were highly locus specific and generated biologically informative profiles for four non‐model plant species.
more »
« less
Pilot RNA‐seq data from 24 species of vascular plants at Harvard Forest
PremiseLarge‐scale projects such as the National Ecological Observatory Network (NEON) collect ecological data on entire biomes to track climate change. NEON provides an opportunity to launch community transcriptomic projects that ask integrative questions in ecology and evolution. We conducted a pilot study to investigate the challenges of collecting RNA‐seq data from diverse plant communities. MethodsWe generated >650 Gbp of RNA‐seq for 24 vascular plant species representing 12 genera and nine families at the Harvard Forest NEON site. Each species was sampled twice in 2016 (July and August). We assessed transcriptome quality and content with TransRate, BUSCO, and Gene Ontology annotations. ResultsOnly modest differences in assembly quality were observed across multiplek‐mers. On average, transcriptomes contained hits to >70% of loci in the BUSCO database. We found no significant difference in the number of assembled and annotated transcripts between diploid and polyploid transcriptomes. DiscussionWe provide new RNA‐seq data sets for 24 species of vascular plants in Harvard Forest. Challenges associated with this type of study included recovery of high‐quality RNA from diverse species and access to NEON sites for genomic sampling. Overcoming these challenges offers opportunities for large‐scale studies at the intersection of ecology and genomics.
more »
« less
- PAR ID:
- 10453213
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Applications in Plant Sciences
- Volume:
- 9
- Issue:
- 2
- ISSN:
- 2168-0450
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
McIntyre, L (Ed.)Abstract The adelgids (Adelgidae) are a small family of sap-feeding insects, which, together with true aphids (Aphididae) and phylloxerans (Phylloxeridae), make up the infraorder Aphidomorpha. Some adelgid species are highly destructive to forest ecosystems such as Adelges tsugae, Adelges piceae, Adelges laricis, Pineus pini, and Pineus boerneri. Despite this, there are no high-quality genomic resources for adelgids, hindering advanced genomic analyses within Adelgidae and among Aphidomorpha. Here, we used PacBio continuous long-read and Illumina RNA-sequencing to construct a high-quality draft genome assembly for the Cooley spruce gall adelgid, Adelges cooleyi (Gillette), a gall-forming species endemic to North America. The assembled genome is 270.2 Mb in total size and has scaffold and contig N50 statistics of 14.87 and 7.18 Mb, respectively. There are 24,967 predicted coding sequences, and the assembly completeness is estimated at 98.1 and 99.6% with core BUSCO gene sets of Arthropoda and Hemiptera, respectively. Phylogenomic analysis using the A. cooleyi genome, 3 publicly available adelgid transcriptomes, 4 phylloxera transcriptomes, the Daktulosphaira vitifoliae (grape phylloxera) genome, 4 aphid genomes, and 2 outgroup coccoid genomes fully resolves adelgids and phylloxerans as sister taxa. The mitochondrial genome is 24 kb, among the largest in insects sampled to date, with 39.4% composed of noncoding regions. This genome assembly is currently the only genome-scale, annotated assembly for adelgids and will be a valuable resource for understanding the ecology and evolution of Aphidomorpha.more » « less
-
Abstract PremisePectocarya recurvata(Boraginaceae, subfamily Cynoglossoideae), a species native to the Sonoran Desert (North America), has served as a model system for a suite of ecological and evolutionary studies. However, no reference genomes are currently available in Cynoglossoideae. A high‐quality reference genome forP. recurvatawould be valuable for addressing questions in this system and across broader taxonomic scales. MethodsUsing PacBio HiFi sequencing, we assembled a reference genome forP. recurvataand annotated coding regions with full‐length transcripts from an Iso‐Seq library. We assessed genome completeness with BUSCO andk‐mer analysis, and estimated the genome size of six individuals using flow cytometry. ResultsThe chromosome‐scale genome assembly forP. recurvatawas 216.0 Mbp long (N50 = 12.1 Mbp). Previous observations indicatedP. recurvatais 2n = 24. Our assembly included 12 primary contigs (158.3 Mbp) containing 30,655 genes with telomeres at 23 out of 24 ends. Flow cytometry measurements from the same population included two plants with 1C = 196.9 Mbp, the smallest measured for Boraginaceae, and four with 1C = 385.8 Mbp, which is consistent with tetraploidy in this population. DiscussionTheP. recurvatagenome assembly and annotation provide a high‐quality genomic resource in a sparsely represented area of the angiosperm phylogeny. This new reference genome will facilitate answering open questions in ecophysiology, biogeography, and systematics.more » « less
-
Abstract BackgroundAlthough RNA-seq data are traditionally used for quantifying gene expression levels, the same data could be useful in an integrated approach to compute genetic distances as well. Challenges to using mRNA sequences for computing genetic distances include the relatively high conservation of coding sequences and the presence of paralogous and, in some species, homeologous genes. ResultsWe developed a new computational method, RNA-clique, for calculating genetic distances using assembled RNA-seq data and assessed the efficacy of the method using biological and simulated data. The method employs reciprocal BLASTn followed by graph-based filtering to ensure that only orthologous genes are compared. Each vertex in the graph constructed for filtering represents a gene in a specific sample under comparison, and an edge connects a pair of vertices if the genes they represent are best matches for each other in their respective samples. The distance computation is a function of the BLAST alignment statistics and the constructed graph and incorporates only those genes that are present in some complete connected component of this graph. As a biological testbed we used RNA-seq data of tall fescue (Lolium arundinaceum), an allohexaploid plant ($$2n = 14\text { Gb}$$ ), and bluehead wrasse (Thalassoma bifasciatum), a teleost fish. RNA-clique reliably distinguished individual tall fescue plants by genotype and distinguished bluehead wrasse RNA-seq samples by individual. In tests with simulated RNA-seq data, the ground truth phylogeny was accurately recovered from the computed distances. Moreover, tests of the algorithm parameters indicated that, even with stringent filtering for orthologs, sufficient sequence data were retained for the distance computations. Although comparisons with an alternative method revealed that RNA-clique has relatively high time and memory requirements, the comparisons also showed that RNA-clique’s results were at least as reliable as the alternative’s for tall fescue data and were much more reliable for the bluehead wrasse data. ConclusionResults of this work indicate that RNA-clique works well as a way of deriving genetic distances from RNA-seq data, thus providing a methodological integration of functional and genetic diversity studies.more » « less
-
Abstract Over the past three decades, the Harvard Forest Summer Research Program in Ecology (HF‐SRPE) has been at the forefront of expanding the ecological tent for minoritized or otherwise marginalized students. By broadening the definition of ecology to include fields such as data science, software engineering, and remote sensing, we attract a broader range of students, including those who may not prioritize field experiences or who may feel unsafe working in rural or urban field sites. We also work towards a more resilient society in which minoritized or marginalized students can work safely, in part by building teams of students and mentors. Teams collaborate on projects that require a diversity of approaches and create opportunities for students and mentors alike to support one another and share leadership. Finally, HF‐SRPE promotes an expanded view of what it means to become an ecologist. We value and support diverse career paths for ecologists to work in all parts of society, to diversify the face of ecology, and to bring different perspectives together to ensure innovations in environmental problem solving for our planet.more » « less
An official website of the United States government
