Cytonuclear disruption may accompany allopolyploid evolution as a consequence of the merger of different nuclear genomes in a cellular environment having only one set of progenitor organellar genomes. One path to reconcile potential cytonuclear mismatch is biased expression for maternal gene duplicates (homoeologs) encoding proteins that target to plastids and/or mitochondria. Assessment of this transcriptional form of cytonuclear coevolution at the level of individual cells or cell types remains unexplored. Using single-cell (sc-) and single-nucleus (sn-) RNAseq data from eight tissues in three allopolyploid species, we characterized cell type–specific variations of cytonuclear coevolutionary homoeologous expression and demonstrated the temporal dynamics of expression patterns across development stages during cotton fiber development. Our results provide unique insights into transcriptional cytonuclear coevolution in plant allopolyploids at the single-cell level.
Single-cell RNA sequencing is increasingly used to investigate cross-species differences driven by gene expression and cell-type composition in plants. However, the frequent expansion of plant gene families due to whole-genome duplications makes identification of one-to-one orthologues difficult, complicating integration. Here we demonstrate that coexpression can be used to trim many-to-many orthology families down to identify one-to-one gene pairs with proxy expression profiles, improving the performance of traditional integration methods and reducing barriers to integration across a diverse array of plant species.
more » « less- PAR ID:
- 10518240
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Nature Plants
- Volume:
- 10
- Issue:
- 7
- ISSN:
- 2055-0278
- Format(s):
- Medium: X Size: p. 1075-1080
- Size(s):
- p. 1075-1080
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract We aim to enable the accurate and efficient transfer of knowledge about gene function gained from
Arabidopsis thaliana and other model organisms to other plant species. This knowledge transfer is frequently challenging in plants due to duplications of individual genes and whole genomes in plant lineages. Such duplications result in complex evolutionary relationships between related genes, which may have similar sequences but highly divergent functions. In such cases, functional inference requires more than a simple sequence similarity calculation. We have developed an online resource, PhyloGenes (phylogenes.org), that displays precomputed phylogenetic trees for plant gene families along with experimentally validated function information for individual genes within the families. A total of 40 plant genomes and 10 non‐plant model organisms are represented in over 8,000 gene families. Evolutionary events such as speciation and duplication are clearly labeled on gene trees to distinguish orthologs from paralogs. Nearly 6,000 families have at least one member with an experimentally supported annotation to a Gene Ontology (GO) molecular function or biological process term. By displaying experimentally validated gene functions associated to individual genes within a tree, PhyloGenes enables functional inference for genes of uncharacterized function, based on their evolutionary relationships to experimentally studied genes, in a visually traceable manner. For the many families containing genes that have evolved to perform different functions, PhyloGenes facilitates the use of evolutionary history to determine the most likely function of genes that have not been experimentally characterized. Future work will enrich the resource by incorporating additional gene function datasets such as plant gene expression atlas data. -
Although genome-sequence assemblies are available for a growing number of plant species, gene-expression responses to stimuli have been cataloged for only a subset of these species. Many genes show altered transcription patterns in response to abiotic stresses. However, orthologous genes in related species often exhibit different responses to a given stress. Accordingly, data on the regulation of gene expression in one species are not reliable predictors of orthologous gene responses in a related species. Here, we trained a supervised classification model to identify genes that transcriptionally respond to cold stress. A model trained with only features calculated directly from genome assemblies exhibited only modest decreases in performance relative to models trained by using genomic, chromatin, and evolution/diversity features. Models trained with data from one species successfully predicted which genes would respond to cold stress in other related species. Cross-species predictions remained accurate when training was performed in cold-sensitive species and predictions were performed in cold-tolerant species and vice versa. Models trained with data on gene expression in multiple species provided at least equivalent performance to models trained and tested in a single species and outperformed single-species models in cross-species prediction. These results suggest that classifiers trained on stress data from well-studied species may suffice for predicting gene-expression patterns in related, less-studied species with sequenced genomes.more » « less
-
Abstract Background Current methods for analyzing single-cell datasets have relied primarily on static gene expression measurements to characterize the molecular state of individual cells. However, capturing temporal changes in cell state is crucial for the interpretation of dynamic phenotypes such as the cell cycle, development, or disease progression. RNA velocity infers the direction and speed of transcriptional changes in individual cells, yet it is unclear how these temporal gene expression modalities may be leveraged for predictive modeling of cellular dynamics.
Results Here, we present the first task-oriented benchmarking study that investigates integration of temporal sequencing modalities for dynamic cell state prediction. We benchmark ten integration approaches on ten datasets spanning different biological contexts, sequencing technologies, and species. We find that integrated data more accurately infers biological trajectories and achieves increased performance on classifying cells according to perturbation and disease states. Furthermore, we show that simple concatenation of spliced and unspliced molecules performs consistently well on classification tasks and can be used over more memory intensive and computationally expensive methods.
Conclusions This work illustrates how integrated temporal gene expression modalities may be leveraged for predicting cellular trajectories and sample-associated perturbation and disease phenotypes. Additionally, this study provides users with practical recommendations for task-specific integration of single-cell gene expression modalities.
-
Abstract Aim To test the latitudinal gradient in plant species diversity for self‐similarity across taxonomic scales and amongst taxa.
Location North America.
Methods We used species richness data from 245 local vascular plant floras to quantify the slope and shape of the latitudinal gradients in species diversity (
LGSD ) across all plant species as well as within each family and order. We calculated the contribution of each family and order to the empiricalLGSD .Results We observed the canonical
LGSD when all plants were considered with floras at the lowest latitudes having, on average, 451 more species than floras at the highest latitudes. When considering slope alone, most orders and families showed the expected negative slope, but 31.7% of families and 27.7% of orders showed either no significant relationship between latitude and diversity or a reverseLGSD . Latitudinal patterns of family diversity account for at least 14% of thisLGSD . Most orders and families did not show the negative slope and concave‐down quadratic shape expected by the pattern for all plant species. A majority of families did not make a significant contribution in species to theLGSD with 53% of plant families contributing little to nothing to the overall gradient. Ten families accounted for more than 70% of the gradient. Two families, the Asteraceae and Fabaceae, contributed a third of theLGSD .Main Conclusions The empirical
LGSD we describe here is a consequence of a gradient in the number of families and diversification within relative few plant families. Macroecological studies typically aim to generate models that are general across taxa with the implicit assumption that the models are general within taxa. Our results strongly suggest that models of the latitudinal gradient in plant species richness that rely on environmental covariates (e.g. temperature, energy) are likely not general across plant taxa.