The transition from fertilized egg to larva in fish is accompanied with various biological processes. We selected seven early developmental stages in channel catfish, Ictalurus punctatus, for transcriptome analysis, and covered 22,635 genes with 590 million high-quality RNA-sequencing (seq) reads. Differential expression analysis between neighboring developmental timepoints revealed significantly enriched biological categories associated with growth, development and morphogenesis, which was most evident at 2 vs. 5 days post fertilization (dpf) and 5 vs. 6 dpf. A gene co-expression network was constructed using the Weighted Gene Co-expression Network Analysis (WGCNA) approach and four critical modules were identified. Among candidate hub genes, GDF10, FOXA2, HCEA and SYCE3 were involved in head formation, egg development and the transverse central element of synaptonemal complexes. CK1, OAZ2, DARS1 and UBE2V2 were mainly associated with regulation of cell cycle, growth, brain development, differentiation and proliferation of enterocytes. IFI44L and ZIP10 were critical for the regulation of immune activity and ion transport. Additionally, TCK1 and TGFB1 were related to phosphate transport and regulating cell proliferation. All these genes play vital roles in embryogenesis and regulation of early development. These results serve as a rich dataset for functional genomic studies. Our work reveals new insights of the underlying mechanisms in channel catfish early development.
more »
« less
A Novel Calibration Step in Gene Co-Expression Network Construction
High-throughput technologies such as DNA microarrays and RNA-sequencing are used to measure the expression levels of large numbers of genes simultaneously. To support the extraction of biological knowledge, individual gene expression levels are transformed to Gene Co-expression Networks (GCNs). In a GCN, nodes correspond to genes, and the weight of the connection between two nodes is a measure of similarity in the expression behavior of the two genes. In general, GCN construction and analysis includes three steps; 1) calculating a similarity value for each pair of genes 2) using these similarity values to construct a fully connected weighted network 3) finding clusters of genes in the network, commonly called modules. The specific implementation of these three steps can significantly impact the final output and the downstream biological analysis. GCN construction is a well-studied topic. Existing algorithms rely on relatively simple statistical and mathematical tools to implement these steps. Currently, software package WGCNA appears to be the most widely accepted standard. We hypothesize that the raw features provided by sequencing data can be leveraged to extract modules of higher quality. A novel preprocessing step of the gene expression data set is introduced that in effect calibrates the expression levels of individual genes, before computing pairwise similarities. Further, the similarity is computed as an inner-product of positive vectors. In experiments, this provides a significant improvement over WGCNA, as measured by aggregate p -values of the gene ontology term enrichment of the computed modules.
more »
« less
- Award ID(s):
- 2039863
- PAR ID:
- 10426017
- Date Published:
- Journal Name:
- Frontiers in Bioinformatics
- Volume:
- 1
- ISSN:
- 2673-7647
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Cherry, J M (Ed.)Abstract The mechanisms that coordinate cellular gene expression are highly complex and intricately interconnected. Thus, it is necessary to move beyond a fully reductionist approach to understanding genetic information flow and begin focusing on the networked connections between genes that organize cellular function. Continued advancements in computational hardware, coupled with the development of gene correlation network algorithms, provide the capacity to study networked interactions between genes rather than their isolated functions. For example, gene coexpression networks are used to construct gene relationship networks using linear metrics such as Spearman or Pearson correlation. Recently, there have been tools designed to deepen these analyses by differentiating between intrinsic vs extrinsic noise within gene expression values, identifying different modules based on tissue phenotype, and capturing potential nonlinear relationships. In this report, we introduce an algorithm with a novel application of image-based segmentation modalities utilizing blob detection techniques applied for detecting bigenic edges in a gene expression matrix. We applied this algorithm called EdgeCrafting to a bulk RNA-sequencing gene expression matrix comprised of a healthy kidney and cancerous kidney data. We then compared EdgeCrafting against 4 other RNA expression analysis techniques: Weighted Gene Correlation Network Analysis, Knowledge Independent Network Construction, NetExtractor, and Differential gene expression analysis.more » « less
-
Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After initiation, the disease signal goes to the next group of genes related to the second stage of a cancer, which can be represented as a bipartite graph. In other words, bipartite graphs represent the cross-talk among the genes between two disease stages. To prove this hypothesis, gene expression values for three cancers— breast invasive carcinoma (BRCA), colorectal adenocarcinoma (COAD) and glioblastoma multiforme (GBM)—are used for analysis. First, a co-expression gene network is generated with highly correlated gene pairs with a Pearson correlation coefficient ≥ 0.9. Second, clique structures of all sizes are isolated from the co-expression network. Then combining these cliques, three different biomarker modules are developed—maximal clique-like modules, 2-clique-1-bipartite modules, and 3-clique-2-bipartite modules. The list of biomarker genes discovered from these network modules are validated as the essential genes for causing a cancer in terms of network properties and survival analysis. This list of biomarker genes will help biologists to design wet lab experiments for further elucidating the complex mechanism of cancer.more » « less
-
Plants have evolved with complex sensory systems to recognize signals from multiple environmental conditions. A light signal is one of the most important environmental factors that regulates not only photomorphogenesis but also the developmental strategy of plants throughout their life cycle. The molecular mechanisms of the light signaling modules and the interactions between light and other environmental signals have been studied extensively. However, to enhance plant growth, particularly in crop production, we need to gain a deeper understanding of how light regulates plant development within gene regulatory networks (GRNs). Understanding GRNs is important to identify not only the novel genes and transcription factors in light signaling pathways but also the factors that connect light signaling and other environmental signals. Weighted gene co-expression network analysis (WGCNA) has been used to study GRN. We applied WGCNA to 58 RNA-seq samples of wild-type Arabidopsis grown under different light treatments and built the gene co-expression networks. We identified 14 different modules that are significantly associated with different light treatments. Among them, the honeydew1 and ivory display significant association with the dark-grown seedlings. Many hub genes identified from these modules are significantly enriched in light responses, including responses to red, far-red, blue light, light stimulus, auxin responses, and photosynthesis. Although we found many known transcription factors in these modules, we also identified several unknown genes and transcription factors that are significantly associated with the honeydew1 module and highly differentially expressed between dark and light conditions. To examine whether the hub genes in the honeydew1 module play a role in light signaling, we isolated mutants in selected hub genes and measured hypocotyl lengths under dark, red, and far-red light conditions. These assays showed that four hub genes are involved in regulating light signaling pathways. This study provides a new approach to identifying novel genes in GRNs underlying light responses in Arabidopsis.more » « less
-
Abstract Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The ‘one-size-fits-all’ approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow.more » « less
An official website of the United States government

