skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Simulating Single-Cell Gene Expression Count Data with Preserved Gene Correlations by scDesign2
Award ID(s):
1846216
PAR ID:
10411145
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Journal of Computational Biology
Volume:
29
Issue:
1
ISSN:
1557-8666
Page Range / eLocation ID:
23 to 26
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT: Motivation Single-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then, a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity and extra (e.g. spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data. Results Here, we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and the cell-type annotation on targeted gene profiling data. Availability and implementation The R package is open-access and available at https://github.com/JSB-UCLA/scPNMF. The data used in this work are available at Zenodo: https://doi.org/10.5281/zenodo.4797997. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  2. Ma, Li-Jun (Ed.)
    Abstract By introducing novel capacities and functions, new genes and gene families may play a crucial role in ecological transitions. Mechanisms generating new gene families include de novo gene birth, horizontal gene transfer, and neofunctionalization following a duplication event. The ectomycorrhizal (ECM) symbiosis is a ubiquitous mutualism and the association has evolved repeatedly and independently many times among the fungi, but the evolutionary dynamics enabling its emergence remain elusive. We developed a phylogenetic workflow to first understand if gene families unique to ECM Amanita fungi and absent from closely related asymbiotic species are functionally relevant to the symbiosis, and then to systematically infer their origins. We identified 109 gene families unique to ECM Amanita species. Genes belonging to unique gene families are under strong purifying selection and are upregulated during symbiosis, compared with genes of conserved or orphan gene families. The origins of seven of the unique gene families are strongly supported as either de novo gene birth (two gene families), horizontal gene transfer (four), or gene duplication (one). An additional 34 families appear new because of their selective retention within symbiotic species. Among the 109 unique gene families, the most upregulated gene in symbiotic cultures encodes a 1-aminocyclopropane-1-carboxylate deaminase, an enzyme capable of downregulating the synthesis of the plant hormone ethylene, a common negative regulator of plant-microbial mutualisms. 
    more » « less
  3. Zhang, Xiuwei (Ed.)
    Inferring gene regulatory networks from gene expression data is an important and challenging problem in the biology community. We propose OTVelo, a methodology that takes time-stamped single-cell gene expression data as input and predicts gene regulation across two time points. It is known that the rate of change of gene expression, which we will refer to as gene velocity, provides crucial information that enhances such inference; however, this information is not always available due to the limitations in sequencing depth. Our algorithm overcomes this limitation by estimating gene velocities using optimal transport. We then infer gene regulation using time-lagged correlation and Granger causality via regularized linear regression. Instead of providing an aggregated network across all time points, our method uncovers the underlying dynamical mechanism across time points. We validate our algorithm on 13 simulated datasets with both synthetic and curated networks and demonstrate its efficacy on 9 experimental data sets. 
    more » « less
  4. Abstract Gene co-expression networks are a widely used tool for summarizing transcriptomic variation between individuals, and for inferring the transcriptional regulatory pathways that mediate genotype–phenotype relationships. However, these co-expression networks must be interpreted with caution, as they can arise from multiple processes. Here, we investigate one such process, using simulations to demonstrate that hybridization and gene flow between populations can greatly modify co-expression networks. Admixture between populations produces correlated expression between genes experiencing linkage disequilibrium. This correlated expression does not reflect functional relationships between genes but rather depends on migration rates and physical linkage on chromosomes. Given the prevalence of gene flow and hybridization between divergent populations in nature, these introgression effects likely represent a significant force in network evolution, even in populations where hybridization is historical rather than contemporary. These findings emphasize the critical importance of considering both evolutionary history and genomic architecture when analyzing gene co-expression networks in natural populations. 
    more » « less
  5. Exploring the functions of genes and gene products is crucial to a wide range of fields, including medical research, evolutionary biology, and environmental science. However, discovering new functions largely relies on expensive and exhaustive wet lab experiments. Existing methods of automatic function annotation or prediction mainly focus on protein function prediction with sequence, 3D-structures or protein family information. In this study, we propose to tackle the gene function prediction problem by exploring Gene Ontology graph and annotation with BERT (GoBERT) to decipher the underlying relationships among gene functions. Our proposed novel function prediction task utilizes existing functions as inputs and generalizes the function prediction to gene and gene products. Specifically, two pre-train tasks are designed to jointly train GoBERT to capture both explicit and implicit relations of functions. Neighborhood prediction is a self-supervised multi-label classification task that captures the explicit function relations. Specified masking and recovering task helps GoBERT in finding implicit patterns among functions. The pre-trained GoBERT possess the ability to predict novel functions for various gene and gene products based on known functional annotations. Extensive experiments, biological case studies, and ablation studies are conducted to demonstrate the superiority of our proposed GoBERT. 
    more » « less