ABSTRACT For most species, transcriptome data are much more readily available than genome data. Without a reference genome, gene calling is cumbersome and inaccurate because of the high degree of redundancy in de novo transcriptome assemblies. To simplify and increase the accuracy of de novo transcriptome assembly in the absence of a reference genome, we developed UnigeneFinder. Combining several clustering methods, UnigeneFinder substantially reduces the redundancy typical of raw transcriptome assemblies. This pipeline offers an effective solution to the problem of inflated transcript numbers, achieving a closer representation of the actual underlying genome. UnigeneFinder performs comparably or better, compared with existing tools, on plant species with varying genome complexities. UnigeneFinder is the only available transcriptome redundancy solution that fully automates the generation of primary transcript, coding region, and protein sequences, analogous to those available for high‐quality reference genomes. These features, coupled with the pipeline’s cross‐platform implementation, focus on automation, and an accessible, user‐friendly interface, make UnigeneFinder a useful tool for many downstream sequence‐based analyses in nonmodel organisms lacking a reference genome, including differential gene expression analysis, accurate ortholog identification, functional enrichments, and evolutionary analyses. UnigeneFinder also runs efficiently both on high‐performance computing (HPC) systems and personal computers, further reducing barriers to use. 
                        more » 
                        « less   
                    
                            
                            Divergence of cochlear transcriptomics between reference‑based and reference‑free transcriptome analyses among Rhinolophus ferrumequinum populations
                        
                    
    
            Differences in gene expression within tissues can lead to differences in tissue function. Understanding the transcriptome of a species helps elucidate the molecular mechanisms underlying phenotypic divergence. According to the presence or absence of a reference genome of for a studied species, transcriptome analyses can be divided into reference‑based and reference‑free methods, respectively. Presently, comparisons of complete transcriptome analysis results between those two methods are still rare. In this study, we compared the cochlear transcriptome analysis results of greater horseshoe bats (Rhinolophus ferrumequinum) from three lineages in China with different acoustic phenotypes using reference‑based and reference‑free methods to explore their differences in subsequent analysis. The results gained by reference-based results had lower false-positive rates and were more accurate because differentially expressed genes among the three populations obtained by this method had greater reliability and a higher annotation rate. Some phenotype-related enrichment terms, including those related to inorganic molecules and proton transmembrane channels, were also obtained only by the reference-based method. However, the reference‑based method might have the limitation of incomplete information acquisition. Thus, we believe that a combination of reference‑free and reference‑based methods is ideal for transcriptome analyses. The results of our study provided a reference for the selection of transcriptome analysis methods in the future. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1911853
- PAR ID:
- 10516015
- Editor(s):
- Verma, Shailender Kumar
- Publisher / Repository:
- PLOS ONE
- Date Published:
- Journal Name:
- PLOS ONE
- Volume:
- 18
- Issue:
- 7
- ISSN:
- 1932-6203
- Page Range / eLocation ID:
- e0288404
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Foliar chemistry values were obtained from two important native tree species (white oak (Quercus alba L.) and red maple (Acer rubrum L.)) across urban and reference forest sites of three major cities in the eastern United States during summer 2015 (New York, NY (NYC); Philadelphia, PA; and Baltimore, MD). Trees were selected from secondary growth oak-hickory forests found in New York, NY; Philadelphia, PA; and Baltimore, MD, as well as at reference forest sites outside each metropolitan area. In all three metropolitan areas, urban forest patches and references forest sites were selected based on the presence of red maple and white oak canopy dominant trees in patches of at least 1.5 hectares with slopes less than 25%, and well-drained soils of similar soil series within each metropolitan area. Within each city, several forest patches were selected to capture the variation in forest patch site conditions across an individual city. All reference sites were located in protected areas outside of the city and within intermix wildland-urban interface landscapes, in order to target similar contexts of surrounding land use and population density (Martinuzzi et al. 2015). Several reference sites were selected for each city, located within the same protected area considered representative of rural forests of the region. White oaks were at least 38.1 cm diameter at breast height (DBH), red maples were at least 25.4 cm DBH, and all trees were dominant or co-dominant canopy trees. The trees had no major trunk cavities and had crown vigor scores of 1 or 2 (less than 25% overall canopy damage; Pontius & Hallett 2014). From early July to early August 2015, sun leaves were collected from the periphery of the crown of each tree with either a shotgun or slingshot for subsequent analysis to determine differences in foliar chemistry across cities and urban vs. reference forest site types. The data were used to invstigate whether differences in native tree physiology occur between urban and reference forest patches, and whether those differences are site- and species-specific. A complete analysis of these data can be found in: Sonti, NF. 2019. Ecophysiological and social functions of urban forest patches. Ph.D. dissertation. University of Maryland, College Park, MD. 166 p. References: Martinuzzi S, Stewart SI, Helmers DP, Mockrin MH, Hammer RB, Radeloff VC. 2015. The 2010 wildland-urban interface of the conterminous United States. Research Map NRS-8. US Department of Agriculture, Forest Service, Northern Research Station: Newtown Square, PA. Pontius J, Hallett R. 2014. Comprehensive methods for earlier detection and monitoring of forest decline. Forest Science 60(6): 1156-1163.more » « less
- 
            Phylogenomic investigations of biodiversity facilitate the detection of fine-scale population genetic structure and the demographic histories of species and populations. However, determining whether or not the genetic divergence measured among populations reflects species-level differentiation remains a central challenge in species delimitation. One potential solution is to compare genetic divergence between putative new species with other closely related species, sometimes referred to as a reference-based taxonomy. To be described as a new species, a population should be at least as divergent as other species. Here, we develop a reference-based taxonomy for Horned Lizards ( Phrynosoma ; 17 species) using phylogenomic data (ddRADseq data) to provide a framework for delimiting species in the Greater Short-horned Lizard species complex ( P. hernandesi ). Previous species delimitation studies of this species complex have produced conflicting results, with morphological data suggesting that P. hernandesi consists of five species, whereas mitochondrial DNA support anywhere from 1 to 10 + species. To help address this conflict, we first estimated a time-calibrated species tree for P. hernandesi and close relatives using SNP data. These results support the paraphyly of P. hernandesi; we recommend the recognition of two species to promote a taxonomy that is consistent with species monophyly. There is strong evidence for three populations within P. hernandesi , and demographic modeling and admixture analyses suggest that these populations are not reproductively isolated, which is consistent with previous morphological analyses that suggest hybridization could be common. Finally, we characterize the population-species boundary by quantifying levels of genetic divergence for all 18 Phrynosoma species. Genetic divergence measures for western and southern populations of P. hernandesi failed to exceed those of other Phrynosoma species, but the relatively small population size estimated for the northern population causes it to appear as a relatively divergent species. These comparisons underscore the difficulties associated with putting a reference-based approach to species delimitation into practice. Nevertheless, the reference-based approach offers a promising framework for the consistent assessment of biodiversity within clades of organisms with similar life histories and ecological traits.more » « less
- 
            Claydon, John A. (Ed.)Reef fishes support important fisheries throughout the Caribbean, but a combination of factors in the tropics makes otolith microstructure difficult to interpret for age estimation. Therefore, validation of ageing methods, via application of Δ 14 C is a major research priority. Utilizing known-age otolith material from north Caribbean fishes, we determined that a distinct regional Δ 14 C chronology exists, differing from coral-based chronologies compiled for ageing validation from a wide-ranging area of the Atlantic and from an otolith-based chronology from the Gulf of Mexico. Our north Caribbean Δ 14 C chronology established a decline series with narrow prediction intervals that proved successful in ageing validation of three economically important reef fish species. In examining why our north Caribbean Δ 14 C chronology differed from some of the coral-based Δ 14 C data reported from the region, we determined differences among study objectives and research design impact Δ 14 C temporal relationships. This resulted in establishing the first of three important considerations relevant to applying Δ 14 C chronologies for ageing validation: 1) evaluation of the applicability of original goal/objectives and study design of potential Δ 14 C reference studies. Next, we determined differences between our Δ 14 C chronology and those from Florida and the Gulf of Mexico were explained by differences in regional patterns of oceanic upwelling, resulting in the second consideration for future validation work: 2) evaluation of the applicability of Δ 14 C reference data to the region/location where fish samples were obtained. Lastly, we emphasize the application of our north Caribbean Δ 14 C chronology should be limited to ageing validation studies of fishes from this region known to inhabit shallow water coral habitat as juveniles. Thus, we note the final consideration to strengthen findings of future age validation studies: 3) use of Δ 14 C analysis for age validation should be limited to species whose juvenile habitat is known to reflect the regional Δ 14 C reference chronology.more » « less
- 
            PremiseTagSeq is a cost‐effective approach for gene expression studies requiring a large number of samples. To date, TagSeq studies in plants have been limited to those with a high‐quality reference genome. We tested the suitability of reference transcriptomes for TagSeq in non‐model plants, as part of a study of natural gene expression variation at the Santa Rita Experimental Range National Ecological Observatory Network (NEON) core site. MethodsTissue for TagSeq was sampled from multiple individuals of four species (Bouteloua aristidoidesandEragrostis lehmanniana[Poaceae],Tidestromia lanuginosa[Amaranthaceae], andParkinsonia florida[Fabaceae]) at two locations on three dates (56 samples total). One sample per species was used to create a reference transcriptome via standard RNA‐seq. TagSeq performance was assessed by recovery of reference loci, specificity of tag alignments, and variation among samples. ResultsA high fraction of tags aligned to each reference and mapped uniquely. Expression patterns were quantifiable for tens of thousands of loci, which revealed consistent spatial differentiation in expression for all species. DiscussionTagSeq using de novo reference transcriptomes was an effective approach to quantifying gene expression in this study. Tags were highly locus specific and generated biologically informative profiles for four non‐model plant species.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    